From Jason Turner

[lex.literal]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpben6t3zn/{from.md → to.md} +298 -282
tmp/tmpben6t3zn/{from.md → to.md} RENAMED
@@ -1,10 +1,10 @@
1
  ## Literals <a id="lex.literal">[[lex.literal]]</a>
2
 
3
  ### Kinds of literals <a id="lex.literal.kinds">[[lex.literal.kinds]]</a>
4
 
5
- There are several kinds of literals.[^11]
6
 
7
  ``` bnf
8
  literal:
9
  integer-literal
10
  character-literal
@@ -13,10 +13,13 @@ literal:
13
  boolean-literal
14
  pointer-literal
15
  user-defined-literal
16
  ```
17
 
 
 
 
18
  ### Integer literals <a id="lex.icon">[[lex.icon]]</a>
19
 
20
  ``` bnf
21
  integer-literal:
22
  binary-literal integer-suffixₒₚₜ
@@ -84,12 +87,14 @@ hexadecimal-digit: one of
84
 
85
  ``` bnf
86
  integer-suffix:
87
  unsigned-suffix long-suffixₒₚₜ
88
  unsigned-suffix long-long-suffixₒₚₜ
 
89
  long-suffix unsigned-suffixₒₚₜ
90
  long-long-suffix unsigned-suffixₒₚₜ
 
91
  ```
92
 
93
  ``` bnf
94
  unsigned-suffix: one of
95
  'u U'
@@ -103,10 +108,15 @@ long-suffix: one of
103
  ``` bnf
104
  long-long-suffix: one of
105
  'll LL'
106
  ```
107
 
 
 
 
 
 
108
  In an *integer-literal*, the sequence of *binary-digit*s,
109
  *octal-digit*s, *digit*s, or *hexadecimal-digit*s is interpreted as a
110
  base N integer as shown in table [[lex.icon.base]]; the lexically first
111
  digit of the sequence of digits is the most significant.
112
 
@@ -131,16 +141,16 @@ decimal values ten through fifteen.
131
  `0x10'0000`, and `0'004'000'000` all have the same
132
  value. — *end example*]
133
 
134
  The type of an *integer-literal* is the first type in the list in
135
  [[lex.icon.type]] corresponding to its optional *integer-suffix* in
136
- which its value can be represented. An *integer-literal* is a prvalue.
137
 
138
  **Table: Types of *integer-literal*s** <a id="lex.icon.type">[lex.icon.type]</a>
139
 
140
  | *integer-suffix* | *decimal-literal* | *integer-literal* other than *decimal-literal* |
141
- | ---------------- | ------------------------ | ---------------------------------------------- |
142
  | none | `int` | `int` |
143
  | | `long int` | `unsigned int` |
144
  | | `long long int` | `long int` |
145
  | | | `unsigned long int` |
146
  | | | `long long int` |
@@ -156,10 +166,15 @@ which its value can be represented. An *integer-literal* is a prvalue.
156
  | and `l` or `L` | `unsigned long long int` | `unsigned long long int` |
157
  | `ll` or `LL` | `long long int` | `long long int` |
158
  | | | `unsigned long long int` |
159
  | Both `u` or `U` | `unsigned long long int` | `unsigned long long int` |
160
  | and `ll` or `LL` | | |
 
 
 
 
 
161
 
162
 
163
  If an *integer-literal* cannot be represented by any type in its list
164
  and an extended integer type [[basic.fundamental]] can represent its
165
  value, it may have that extended integer type. If all of the types in
@@ -189,157 +204,165 @@ c-char-sequence:
189
  c-char-sequence c-char
190
  ```
191
 
192
  ``` bnf
193
  c-char:
194
- any member of the basic source character set except the single-quote ''', backslash '\', or new-line character
195
  escape-sequence
196
  universal-character-name
197
  ```
198
 
 
 
 
 
 
 
199
  ``` bnf
200
  escape-sequence:
201
  simple-escape-sequence
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202
  octal-escape-sequence
203
  hexadecimal-escape-sequence
204
  ```
205
 
206
  ``` bnf
207
- simple-escape-sequence: one of
208
- '\'' '\"' '\?' '\\'
209
- '\a' '\b' '\f' '\n' '\r' '\t' '\v'
210
  ```
211
 
212
  ``` bnf
213
  octal-escape-sequence:
214
  '\' octal-digit
215
  '\' octal-digit octal-digit
216
  '\' octal-digit octal-digit octal-digit
 
217
  ```
218
 
219
  ``` bnf
220
  hexadecimal-escape-sequence:
221
- '\x' hexadecimal-digit
222
- hexadecimal-escape-sequence hexadecimal-digit
223
  ```
224
 
225
- A *character-literal* that does not begin with `u8`, `u`, `U`, or `L` is
226
- an *ordinary character literal*. An ordinary character literal that
227
- contains a single *c-char* representable in the execution character set
228
- has type `char`, with value equal to the numerical value of the encoding
229
- of the *c-char* in the execution character set. An ordinary character
230
- literal that contains more than one *c-char* is a
231
- *multicharacter literal*. A multicharacter literal, or an ordinary
232
- character literal containing a single *c-char* not representable in the
233
- execution character set, is conditionally-supported, has type `int`, and
234
- has an *implementation-defined* value.
235
-
236
- A *character-literal* that begins with `u8`, such as `u8'w'`, is a
237
- *character-literal* of type `char8_t`, known as a *UTF-8 character
238
- literal*. The value of a UTF-8 character literal is equal to its ISO/IEC
239
- 10646 code point value, provided that the code point value can be
240
- encoded as a single UTF-8 code unit.
241
-
242
- [*Note 1*: That is, provided the code point value is in the range
243
- [0, 7F] (hexadecimal). — *end note*]
244
-
245
- If the value is not representable with a single UTF-8 code unit, the
246
- program is ill-formed. A UTF-8 character literal containing multiple
247
- *c-char*s is ill-formed.
248
-
249
- A *character-literal* that begins with the letter `u`, such as `u'x'`,
250
- is a *character-literal* of type `char16_t`, known as a *UTF-16
251
- character literal*. The value of a UTF-16 character literal is equal to
252
- its ISO/IEC 10646 code point value, provided that the code point value
253
- is representable with a single 16-bit code unit.
254
-
255
- [*Note 2*: That is, provided the code point value is in the range
256
- [0, FFFF] (hexadecimal). — *end note*]
257
-
258
- If the value is not representable with a single 16-bit code unit, the
259
- program is ill-formed. A UTF-16 character literal containing multiple
260
- *c-char*s is ill-formed.
261
-
262
- A *character-literal* that begins with the letter `U`, such as `U'y'`,
263
- is a *character-literal* of type `char32_t`, known as a *UTF-32
264
- character literal*. The value of a UTF-32 character literal containing a
265
- single *c-char* is equal to its ISO/IEC 10646 code point value. A UTF-32
266
- character literal containing multiple *c-char*s is ill-formed.
267
-
268
- A *character-literal* that begins with the letter `L`, such as `L'z'`,
269
- is a *wide-character literal*. A wide-character literal has type
270
- `wchar_t`.[^12] The value of a wide-character literal containing a
271
- single *c-char* has value equal to the numerical value of the encoding
272
- of the *c-char* in the execution wide-character set, unless the *c-char*
273
- has no representation in the execution wide-character set, in which case
274
- the value is *implementation-defined*.
275
-
276
- [*Note 3*: The type `wchar_t` is able to represent all members of the
277
- execution wide-character set (see 
278
- [[basic.fundamental]]). — *end note*]
279
-
280
- The value of a wide-character literal containing multiple *c-char*s is
281
- *implementation-defined*.
282
-
283
- Certain non-graphic characters, the single quote `'`, the double quote
284
- `"`, the question mark `?`,[^13] and the backslash `\`, can be
285
- represented according to [[lex.ccon.esc]]. The double quote `"` and the
286
- question mark `?`, can be represented as themselves or by the escape
287
- sequences `\"` and `\?` respectively, but the single quote `'` and the
288
- backslash `\` shall be represented by the escape sequences `\'` and `\\`
289
- respectively. Escape sequences in which the character following the
290
- backslash is not listed in [[lex.ccon.esc]] are conditionally-supported,
291
- with *implementation-defined* semantics. An escape sequence specifies a
292
- single character.
293
-
294
- **Table: Escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
295
-
296
- | | | |
297
- | --------------- | -------------- | ------------------ |
298
- | new-line | NL(LF) | `\n` |
299
- | horizontal tab | HT | `\t` |
300
- | vertical tab | VT | `\v` |
301
- | backspace | BS | `\b` |
302
- | carriage return | CR | `\r` |
303
- | form feed | FF | `\f` |
304
- | alert | BEL | `\a` |
305
- | backslash | \ | `` |
306
- | question mark | ? | `\?` |
307
- | single quote | `'` | `\'` |
308
- | double quote | `"` | `\"` |
309
- | octal number | \numconst{ooo} | `numconst{ooo}` |
310
- | hex number | \numconst{hhh} | `\x\numconst{hhh}` |
311
-
312
-
313
- The escape `\\numconst{ooo}` consists of the backslash followed by one,
314
- two, or three octal digits that are taken to specify the value of the
315
- desired character. The escape `\x\numconst{hhh}` consists of the
316
- backslash followed by `x` followed by one or more hexadecimal digits
317
- that are taken to specify the value of the desired character. There is
318
- no limit to the number of digits in a hexadecimal sequence. A sequence
319
- of octal or hexadecimal digits is terminated by the first character that
320
- is not an octal digit or a hexadecimal digit, respectively. The value of
321
- a *character-literal* is *implementation-defined* if it falls outside of
322
- the *implementation-defined* range defined for `char` (for
323
- *character-literal*s with no prefix) or `wchar_t` (for
324
- *character-literal*s prefixed by `L`).
325
 
326
- [*Note 4*: If the value of a *character-literal* prefixed by `u`, `u8`,
327
- or `U` is outside the range defined for its type, the program is
328
- ill-formed. *end note*]
 
329
 
330
- A *universal-character-name* is translated to the encoding, in the
331
- appropriate execution character set, of the character named. If there is
332
- no such encoding, the *universal-character-name* is translated to an
333
- *implementation-defined* encoding.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
334
 
335
- [*Note 5*: In translation phase 1, a *universal-character-name* is
336
- introduced whenever an actual extended character is encountered in the
337
- source text. Therefore, all extended characters are described in terms
338
- of *universal-character-name*s. However, the actual compiler
339
- implementation may use its own native character set, so long as the same
340
- results are obtained. — *end note*]
341
 
342
  ### Floating-point literals <a id="lex.fcon">[[lex.fcon]]</a>
343
 
344
  ``` bnf
345
  floating-point-literal:
@@ -394,23 +417,33 @@ digit-sequence:
394
  digit-sequence '''ₒₚₜ digit
395
  ```
396
 
397
  ``` bnf
398
  floating-point-suffix: one of
399
- 'f l F L'
400
  ```
401
 
402
- The type of a *floating-point-literal* is determined by its
 
403
  *floating-point-suffix* as specified in [[lex.fcon.type]].
404
 
 
 
 
 
405
  **Table: Types of *floating-point-literal*{s}** <a id="lex.fcon.type">[lex.fcon.type]</a>
406
 
407
  | *floating-point-suffix* | type |
408
- | ----------------------- | --------------- |
409
  | none | `double` |
410
  | `f` or `F` | `float` |
411
  | `l` or `L` | `long` `double` |
 
 
 
 
 
412
 
413
 
414
  The *significand* of a *floating-point-literal* is the
415
  *fractional-constant* or *digit-sequence* of a
416
  *decimal-floating-point-literal* or the
@@ -419,11 +452,11 @@ The *significand* of a *floating-point-literal* is the
419
  of *digit*s or *hexadecimal-digit*s and optional period are interpreted
420
  as a base N real number s, where N is 10 for a
421
  *decimal-floating-point-literal* and 16 for a
422
  *hexadecimal-floating-point-literal*.
423
 
424
- [*Note 1*: Any optional separating single quotes are ignored when
425
  determining the value. — *end note*]
426
 
427
  If an *exponent-part* or *binary-exponent-part* is present, the exponent
428
  e of the *floating-point-literal* is the result of interpreting the
429
  sequence of an optional *sign* and the *digit*s as a base 10 integer.
@@ -455,15 +488,21 @@ s-char-sequence:
455
  s-char-sequence s-char
456
  ```
457
 
458
  ``` bnf
459
  s-char:
460
- any member of the basic source character set except the double-quote '"', backslash '\', or new-line character
461
  escape-sequence
462
  universal-character-name
463
  ```
464
 
 
 
 
 
 
 
465
  ``` bnf
466
  raw-string:
467
  '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
468
  ```
469
 
@@ -473,27 +512,43 @@ r-char-sequence:
473
  r-char-sequence r-char
474
  ```
475
 
476
  ``` bnf
477
  r-char:
478
- any member of the source character set, except a right parenthesis ')' followed by
479
- the initial *d-char-sequence* (which may be empty) followed by a double quote '"'.
480
  ```
481
 
482
  ``` bnf
483
  d-char-sequence:
484
  d-char
485
  d-char-sequence d-char
486
  ```
487
 
488
  ``` bnf
489
  d-char:
490
- any member of the basic source character set except:
491
- space, the left parenthesis '(', the right parenthesis ')', the backslash '\', and the control characters
492
- representing horizontal tab, vertical tab, form feed, and newline.
493
  ```
494
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
495
  A *string-literal* that has an `R` in the prefix is a *raw string
496
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
497
  *d-char-sequence* of a *raw-string* is the same sequence of characters
498
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
499
  at most 16 characters.
@@ -536,149 +591,130 @@ R"(x = "\"y\"")"
536
 
537
  is equivalent to `"x = \"\\\"y\\\"\""`.
538
 
539
  — *end example*]
540
 
541
- After translation phase 6, a *string-literal* that does not begin with
542
- an *encoding-prefix* is an *ordinary string literal*. An ordinary string
543
- literal has type “array of *n* `const char`” where *n* is the size of
544
- the string as defined below, has static storage duration [[basic.stc]],
545
- and is initialized with the given characters.
546
-
547
- A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
548
- *UTF-8 string literal*. A UTF-8 string literal has type “array of *n*
549
- `const char8_t`”, where *n* is the size of the string as defined below;
550
- each successive element of the object representation [[basic.types]] has
551
- the value of the corresponding code unit of the UTF-8 encoding of the
552
- string.
553
-
554
  Ordinary string literals and UTF-8 string literals are also referred to
555
  as narrow string literals.
556
 
557
- A *string-literal* that begins with `u`, such as `u"asdf"`, is a *UTF-16
558
- string literal*. A UTF-16 string literal has type “array of *n*
559
- `const char16_t`”, where *n* is the size of the string as defined below;
560
- each successive element of the array has the value of the corresponding
561
- code unit of the UTF-16 encoding of the string.
 
562
 
563
- [*Note 3*: A single *c-char* may produce more than one `char16_t`
564
- character in the form of surrogate pairs. A surrogate pair is a
565
- representation for a single code point as a sequence of two 16-bit code
566
- units. — *end note*]
567
-
568
- A *string-literal* that begins with `U`, such as `U"asdf"`, is a *UTF-32
569
- string literal*. A UTF-32 string literal has type “array of *n*
570
- `const char32_t`”, where *n* is the size of the string as defined below;
571
- each successive element of the array has the value of the corresponding
572
- code unit of the UTF-32 encoding of the string.
573
-
574
- A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
575
- string literal*. A wide string literal has type “array of *n* `const
576
- wchar_t`”, where *n* is the size of the string as defined below; it is
577
- initialized with the given characters.
578
 
579
  In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
580
- concatenated. If both *string-literal*s have the same *encoding-prefix*,
581
- the resulting concatenated *string-literal* has that *encoding-prefix*.
582
- If one *string-literal* has no *encoding-prefix*, it is treated as a
583
- *string-literal* of the same *encoding-prefix* as the other operand. If
584
- a UTF-8 string literal token is adjacent to a wide string literal token,
585
- the program is ill-formed. Any other concatenations are
586
- conditionally-supported with *implementation-defined* behavior.
587
-
588
- [*Note 4*: This concatenation is an interpretation, not a conversion.
589
- Because the interpretation happens in translation phase 6 (after each
590
- character from a *string-literal* has been translated into a value from
591
- the appropriate character set), a *string-literal*’s initial rawness has
592
- no effect on the interpretation or well-formedness of the
593
- concatenation. — *end note*]
 
 
 
 
 
594
 
595
  [[lex.string.concat]] has some examples of valid concatenations.
596
 
 
 
597
  **Table: String literal concatenations** <a id="lex.string.concat">[lex.string.concat]</a>
598
 
599
  | | | | | | |
600
  | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
601
  | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
602
  | `u"a"` | `u"b"` | `u"ab"` | `U"a"` | `U"b"` | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
603
  | `u"a"` | `"b"` | `u"ab"` | `U"a"` | `"b"` | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
604
  | `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
605
 
606
 
607
- Characters in concatenated strings are kept distinct.
608
-
609
- [*Example 2*:
610
-
611
- ``` cpp
612
- "\xA" "B"
613
- ```
614
-
615
- contains the two characters `'\xA'` and `'B'` after concatenation (and
616
- not the single hexadecimal character `'\xAB'`).
617
-
618
- — *end example*]
619
-
620
- After any necessary concatenation, in translation phase 7
621
- [[lex.phases]], `'\0'` is appended to every *string-literal* so that
622
- programs that scan a string can find its end.
623
-
624
- Escape sequences and *universal-character-name*s in non-raw string
625
- literals have the same meaning as in *character-literal*s [[lex.ccon]],
626
- except that the single quote `'` is representable either by itself or by
627
- the escape sequence `\'`, and the double quote `"` shall be preceded by
628
- a `\`, and except that a *universal-character-name* in a UTF-16 string
629
- literal may yield a surrogate pair. In a narrow string literal, a
630
- *universal-character-name* may map to more than one `char` or `char8_t`
631
- element due to *multibyte encoding*. The size of a `char32_t` or wide
632
- string literal is the total number of escape sequences,
633
- *universal-character-name*s, and other characters, plus one for the
634
- terminating `U'\0'` or `L'\0'`. The size of a UTF-16 string literal is
635
- the total number of escape sequences, *universal-character-name*s, and
636
- other characters, plus one for each character requiring a surrogate
637
- pair, plus one for the terminating `u'\0'`.
638
-
639
- [*Note 5*: The size of a `char16_t` string literal is the number of
640
- code units, not the number of characters. — *end note*]
641
-
642
- [*Note 6*: Any *universal-character-name*s are required to correspond
643
- to a code point in the range [0, D800) or [E000, 10FFFF] (hexadecimal)
644
- [[lex.charset]]. — *end note*]
645
-
646
- The size of a narrow string literal is the total number of escape
647
- sequences and other characters, plus at least one for the multibyte
648
- encoding of each *universal-character-name*, plus one for the
649
- terminating `'\0'`.
650
-
651
  Evaluating a *string-literal* results in a string literal object with
652
- static storage duration, initialized from the given characters as
653
- specified above. Whether all *string-literal*s are distinct (that is,
654
- are stored in nonoverlapping objects) and whether successive evaluations
655
- of a *string-literal* yield the same or a different object is
656
- unspecified.
657
-
658
- [*Note 7*: The effect of attempting to modify a *string-literal* is
659
- undefined. — *end note*]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
660
 
661
  ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
662
 
663
  ``` bnf
664
  boolean-literal:
665
  'false'
666
  'true'
667
  ```
668
 
669
  The Boolean literals are the keywords `false` and `true`. Such literals
670
- are prvalues and have type `bool`.
671
 
672
  ### Pointer literals <a id="lex.nullptr">[[lex.nullptr]]</a>
673
 
674
  ``` bnf
675
  pointer-literal:
676
  'nullptr'
677
  ```
678
 
679
- The pointer literal is the keyword `nullptr`. It is a prvalue of type
680
  `std::nullptr_t`.
681
 
682
  [*Note 1*: `std::nullptr_t` is a distinct type that is neither a
683
  pointer type nor a pointer-to-member type; rather, a prvalue of this
684
  type is a null pointer constant and can be converted to a null pointer
@@ -742,14 +778,13 @@ The syntactic non-terminal preceding the *ud-suffix* in a
742
  that could match that non-terminal.
743
 
744
  A *user-defined-literal* is treated as a call to a literal operator or
745
  literal operator template [[over.literal]]. To determine the form of
746
  this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
747
- the *literal-operator-id* whose literal suffix identifier is *X* is
748
- looked up in the context of *L* using the rules for unqualified name
749
- lookup [[basic.lookup.unqual]]. Let *S* be the set of declarations found
750
- by this lookup. *S* shall not be empty.
751
 
752
  If *L* is a *user-defined-integer-literal*, let *n* be the literal
753
  without its *ud-suffix*. If *S* contains a literal operator with
754
  parameter type `unsigned long long`, the literal *L* is treated as a
755
  call of the form
@@ -761,11 +796,11 @@ operator "" X(nULL)
761
  Otherwise, *S* shall contain a raw literal operator or a numeric literal
762
  operator template [[over.literal]] but not both. If *S* contains a raw
763
  literal operator, the literal *L* is treated as a call of the form
764
 
765
  ``` cpp
766
- operator "" X("n{"})
767
  ```
768
 
769
  Otherwise (*S* contains a numeric literal operator template), *L* is
770
  treated as a call of the form
771
 
@@ -774,11 +809,11 @@ operator "" X<'c₁', 'c₂', ... 'cₖ'>()
774
  ```
775
 
776
  where *n* is the source character sequence c₁c₂...cₖ.
777
 
778
  [*Note 1*: The sequence c₁c₂...cₖ can only contain characters from the
779
- basic source character set. — *end note*]
780
 
781
  If *L* is a *user-defined-floating-point-literal*, let *f* be the
782
  literal without its *ud-suffix*. If *S* contains a literal operator with
783
  parameter type `long double`, the literal *L* is treated as a call of
784
  the form
@@ -790,11 +825,11 @@ operator "" X(fL)
790
  Otherwise, *S* shall contain a raw literal operator or a numeric literal
791
  operator template [[over.literal]] but not both. If *S* contains a raw
792
  literal operator, the *literal* *L* is treated as a call of the form
793
 
794
  ``` cpp
795
- operator "" X("f{"})
796
  ```
797
 
798
  Otherwise (*S* contains a numeric literal operator template), *L* is
799
  treated as a call of the form
800
 
@@ -803,11 +838,11 @@ operator "" X<'c₁', 'c₂', ... 'cₖ'>()
803
  ```
804
 
805
  where *f* is the source character sequence c₁c₂...cₖ.
806
 
807
  [*Note 2*: The sequence c₁c₂...cₖ can only contain characters from the
808
- basic source character set. — *end note*]
809
 
810
  If *L* is a *user-defined-string-literal*, let *str* be the literal
811
  without its *ud-suffix* and let *len* be the number of code units in
812
  *str* (i.e., its length excluding the terminating null character). If
813
  *S* contains a literal operator template with a non-type template
@@ -861,39 +896,43 @@ suffix is applied to the result of the concatenation.
861
 
862
  [*Example 3*:
863
 
864
  ``` cpp
865
  int main() {
866
- L"A" "B" "C"_x; // OK: same as L"ABC"_x
867
  "P"_x "Q" "R"_y; // error: two different ud-suffix{es}
868
  }
869
  ```
870
 
871
  — *end example*]
872
 
873
  <!-- Link reference definitions -->
 
874
  [basic.fundamental]: basic.md#basic.fundamental
875
  [basic.link]: basic.md#basic.link
876
  [basic.lookup.unqual]: basic.md#basic.lookup.unqual
877
  [basic.stc]: basic.md#basic.stc
878
- [basic.types]: basic.md#basic.types
879
  [conv.mem]: expr.md#conv.mem
880
  [conv.ptr]: expr.md#conv.ptr
881
  [cpp]: cpp.md#cpp
882
- [cpp.concat]: cpp.md#cpp.concat
883
  [cpp.cond]: cpp.md#cpp.cond
884
  [cpp.import]: cpp.md#cpp.import
885
  [cpp.include]: cpp.md#cpp.include
886
  [cpp.module]: cpp.md#cpp.module
887
  [cpp.stringize]: cpp.md#cpp.stringize
888
  [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
 
889
  [headers]: library.md#headers
890
  [lex]: #lex
891
  [lex.bool]: #lex.bool
892
  [lex.ccon]: #lex.ccon
893
  [lex.ccon.esc]: #lex.ccon.esc
 
894
  [lex.charset]: #lex.charset
 
 
895
  [lex.comment]: #lex.comment
896
  [lex.digraph]: #lex.digraph
897
  [lex.ext]: #lex.ext
898
  [lex.fcon]: #lex.fcon
899
  [lex.fcon.type]: #lex.fcon.type
@@ -904,83 +943,60 @@ int main() {
904
  [lex.key]: #lex.key
905
  [lex.key.digraph]: #lex.key.digraph
906
  [lex.literal]: #lex.literal
907
  [lex.literal.kinds]: #lex.literal.kinds
908
  [lex.name]: #lex.name
909
- [lex.name.allowed]: #lex.name.allowed
910
- [lex.name.disallowed]: #lex.name.disallowed
911
  [lex.name.special]: #lex.name.special
912
  [lex.nullptr]: #lex.nullptr
913
  [lex.operators]: #lex.operators
914
  [lex.phases]: #lex.phases
915
  [lex.ppnumber]: #lex.ppnumber
916
  [lex.pptoken]: #lex.pptoken
917
  [lex.separate]: #lex.separate
918
  [lex.string]: #lex.string
919
  [lex.string.concat]: #lex.string.concat
 
920
  [lex.token]: #lex.token
921
  [module.import]: module.md#module.import
922
  [module.unit]: module.md#module.unit
923
  [over.literal]: over.md#over.literal
 
924
  [temp.explicit]: temp.md#temp.explicit
925
  [temp.names]: temp.md#temp.names
926
 
927
- [^1]: Implementations must behave as if these separate phases occur,
928
- although in practice different phases might be folded together.
929
 
930
  [^2]: A partial preprocessing token would arise from a source file
931
  ending in the first portion of a multi-character token that requires
932
  a terminating sequence of characters, such as a *header-name* that
933
  is missing the closing `"` or `>`. A partial comment would arise
934
  from a source file ending with an unclosed `/*` comment.
935
 
936
- [^3]: An implementation need not convert all non-corresponding source
937
- characters to the same execution character.
938
-
939
- [^4]: The glyphs for the members of the basic source character set are
940
- intended to identify characters from the subset of ISO/IEC 10646
941
- which corresponds to the ASCII character set. However, because the
942
- mapping from source file characters to the source character set
943
- (described in translation phase 1) is specified as
944
- *implementation-defined*, an implementation is required to document
945
- how the basic source characters are represented in source files.
946
-
947
- [^5]: A sequence of characters resembling a *universal-character-name*
948
- in an *r-char-sequence* [[lex.string]] does not form a
949
- *universal-character-name*.
950
-
951
- [^6]: These include “digraphs” and additional reserved words. The term
952
  “digraph” (token consisting of two characters) is not perfectly
953
  descriptive, since one of the alternative *preprocessing-token*s is
954
  `%:%:` and of course several primary tokens contain two characters.
955
  Nonetheless, those alternative tokens that aren’t lexical keywords
956
  are colloquially known as “digraphs”.
957
 
958
- [^7]: Thus the “stringized” values [[cpp.stringize]] of `[` and `<:`
959
  will be different, maintaining the source spelling, but the tokens
960
  can otherwise be freely interchanged.
961
 
962
- [^8]: Literals include strings and character and numeric literals.
963
 
964
- [^9]: Thus, a sequence of characters that resembles an escape sequence
965
- might result in an error, be interpreted as the character
966
  corresponding to the escape sequence, or have a completely different
967
  meaning, depending on the implementation.
968
 
969
- [^10]: On systems in which linkers cannot accept extended characters, an
970
- encoding of the *universal-character-name* may be used in forming
971
  valid external identifiers. For example, some otherwise unused
972
- character or sequence of characters may be used to encode the `\u`
973
- in a *universal-character-name*. Extended characters may produce a
974
  long external identifier, but C++ does not place a translation limit
975
- on significant characters for external identifiers. In C++, upper-
976
- and lower-case letters are considered different for all identifiers,
977
- including external identifiers.
978
 
979
- [^11]: The term “literal” generally designates, in this document, those
980
  tokens that are called “constants” in ISO C.
981
-
982
- [^12]: They are intended for character sets where a character does not
983
- fit into a single byte.
984
-
985
- [^13]: Using an escape sequence for a question mark is supported for
986
- compatibility with ISO C++14 and ISO C.
 
1
  ## Literals <a id="lex.literal">[[lex.literal]]</a>
2
 
3
  ### Kinds of literals <a id="lex.literal.kinds">[[lex.literal.kinds]]</a>
4
 
5
+ There are several kinds of literals.[^8]
6
 
7
  ``` bnf
8
  literal:
9
  integer-literal
10
  character-literal
 
13
  boolean-literal
14
  pointer-literal
15
  user-defined-literal
16
  ```
17
 
18
+ [*Note 1*: When appearing as an *expression*, a literal has a type and
19
+ a value category [[expr.prim.literal]]. — *end note*]
20
+
21
  ### Integer literals <a id="lex.icon">[[lex.icon]]</a>
22
 
23
  ``` bnf
24
  integer-literal:
25
  binary-literal integer-suffixₒₚₜ
 
87
 
88
  ``` bnf
89
  integer-suffix:
90
  unsigned-suffix long-suffixₒₚₜ
91
  unsigned-suffix long-long-suffixₒₚₜ
92
+ unsigned-suffix size-suffixₒₚₜ
93
  long-suffix unsigned-suffixₒₚₜ
94
  long-long-suffix unsigned-suffixₒₚₜ
95
+ size-suffix unsigned-suffixₒₚₜ
96
  ```
97
 
98
  ``` bnf
99
  unsigned-suffix: one of
100
  'u U'
 
108
  ``` bnf
109
  long-long-suffix: one of
110
  'll LL'
111
  ```
112
 
113
+ ``` bnf
114
+ size-suffix: one of
115
+ 'z Z'
116
+ ```
117
+
118
  In an *integer-literal*, the sequence of *binary-digit*s,
119
  *octal-digit*s, *digit*s, or *hexadecimal-digit*s is interpreted as a
120
  base N integer as shown in table [[lex.icon.base]]; the lexically first
121
  digit of the sequence of digits is the most significant.
122
 
 
141
  `0x10'0000`, and `0'004'000'000` all have the same
142
  value. — *end example*]
143
 
144
  The type of an *integer-literal* is the first type in the list in
145
  [[lex.icon.type]] corresponding to its optional *integer-suffix* in
146
+ which its value can be represented.
147
 
148
  **Table: Types of *integer-literal*s** <a id="lex.icon.type">[lex.icon.type]</a>
149
 
150
  | *integer-suffix* | *decimal-literal* | *integer-literal* other than *decimal-literal* |
151
+ | ---------------- | ----------------------------------------- | ---------------------------------------------- |
152
  | none | `int` | `int` |
153
  | | `long int` | `unsigned int` |
154
  | | `long long int` | `long int` |
155
  | | | `unsigned long int` |
156
  | | | `long long int` |
 
166
  | and `l` or `L` | `unsigned long long int` | `unsigned long long int` |
167
  | `ll` or `LL` | `long long int` | `long long int` |
168
  | | | `unsigned long long int` |
169
  | Both `u` or `U` | `unsigned long long int` | `unsigned long long int` |
170
  | and `ll` or `LL` | | |
171
+ | `z` or `Z` | the signed integer type corresponding | the signed integer type |
172
+ | | to `std::size_t` [[support.types.layout]] | corresponding to `std::size_t` |
173
+ | | | `std::size_t` |
174
+ | Both `u` or `U` | `std::size_t` | `std::size_t` |
175
+ | and `z` or `Z` | | |
176
 
177
 
178
  If an *integer-literal* cannot be represented by any type in its list
179
  and an extended integer type [[basic.fundamental]] can represent its
180
  value, it may have that extended integer type. If all of the types in
 
204
  c-char-sequence c-char
205
  ```
206
 
207
  ``` bnf
208
  c-char:
209
+ basic-c-char
210
  escape-sequence
211
  universal-character-name
212
  ```
213
 
214
+ ``` bnf
215
+ basic-c-char:
216
+ any member of the translation character set except the U+0027 (apostrophe),
217
+ U+005c (reverse solidus), or new-line character
218
+ ```
219
+
220
  ``` bnf
221
  escape-sequence:
222
  simple-escape-sequence
223
+ numeric-escape-sequence
224
+ conditional-escape-sequence
225
+ ```
226
+
227
+ ``` bnf
228
+ simple-escape-sequence:
229
+ '\' simple-escape-sequence-char
230
+ ```
231
+
232
+ ``` bnf
233
+ simple-escape-sequence-char: one of
234
+ '' " ? \ a b f n r t v'
235
+ ```
236
+
237
+ ``` bnf
238
+ numeric-escape-sequence:
239
  octal-escape-sequence
240
  hexadecimal-escape-sequence
241
  ```
242
 
243
  ``` bnf
244
+ simple-octal-digit-sequence:
245
+ octal-digit
246
+ simple-octal-digit-sequence octal-digit
247
  ```
248
 
249
  ``` bnf
250
  octal-escape-sequence:
251
  '\' octal-digit
252
  '\' octal-digit octal-digit
253
  '\' octal-digit octal-digit octal-digit
254
+ '\o{' simple-octal-digit-sequence '}'
255
  ```
256
 
257
  ``` bnf
258
  hexadecimal-escape-sequence:
259
+ '\x' simple-hexadecimal-digit-sequence
260
+ '\x{' simple-hexadecimal-digit-sequence '}'
261
  ```
262
 
263
+ ``` bnf
264
+ conditional-escape-sequence:
265
+ '\' conditional-escape-sequence-char
266
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
267
 
268
+ ``` bnf
269
+ conditional-escape-sequence-char:
270
+ any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters 'N', 'o', 'u', 'U', or 'x'
271
+ ```
272
 
273
+ A *non-encodable character literal* is a *character-literal* whose
274
+ *c-char-sequence* consists of a single *c-char* that is not a
275
+ *numeric-escape-sequence* and that specifies a character that either
276
+ lacks representation in the literal’s associated character encoding or
277
+ that cannot be encoded as a single code unit. A *multicharacter literal*
278
+ is a *character-literal* whose *c-char-sequence* consists of more than
279
+ one *c-char*. The *encoding-prefix* of a non-encodable character literal
280
+ or a multicharacter literal shall be absent. Such *character-literal*s
281
+ are conditionally-supported.
282
+
283
+ The kind of a *character-literal*, its type, and its associated
284
+ character encoding [[lex.charset]] are determined by its
285
+ *encoding-prefix* and its *c-char-sequence* as defined by
286
+ [[lex.ccon.literal]]. The special cases for non-encodable character
287
+ literals and multicharacter literals take precedence over the base kind.
288
+
289
+ [*Note 1*: The associated character encoding for ordinary character
290
+ literals determines encodability, but does not determine the value of
291
+ non-encodable ordinary character literals or ordinary multicharacter
292
+ literals. The examples in [[lex.ccon.literal]] for non-encodable
293
+ ordinary character literals assume that the specified character lacks
294
+ representation in the ordinary literal encoding or that encoding the
295
+ character would require more than one code unit. — *end note*]
296
+
297
+ **Table: Character literals** <a id="lex.ccon.literal">[lex.ccon.literal]</a>
298
+
299
+ | | | | | |
300
+ | ---- | -------------------------- | ---------- | ------------ | ------- |
301
+ | none | ordinary character literal | `char` | ordinary | `'v'` |
302
+ | `L` | wide character literal | `wchar_t` | wide literal | `L'w'` |
303
+ | | | | encoding | |
304
+ | `u8` | UTF-8 character literal | `char8_t` | UTF-8 | `u8'x'` |
305
+ | `u` | UTF-16 character literal | `char16_t` | UTF-16 | `u'y'` |
306
+ | `U` | UTF-32 character literal | `char32_t` | UTF-32 | `U'z'` |
307
+
308
+
309
+ In translation phase 4, the value of a *character-literal* is determined
310
+ using the range of representable values of the *character-literal*’s
311
+ type in translation phase 7. A non-encodable character literal or a
312
+ multicharacter literal has an *implementation-defined* value. The value
313
+ of any other kind of *character-literal* is determined as follows:
314
+
315
+ - A *character-literal* with a *c-char-sequence* consisting of a single
316
+ *basic-c-char*, *simple-escape-sequence*, or
317
+ *universal-character-name* is the code unit value of the specified
318
+ character as encoded in the literal’s associated character encoding.
319
+ \[*Note 2*: If the specified character lacks representation in the
320
+ literal’s associated character encoding or if it cannot be encoded as
321
+ a single code unit, then the literal is a non-encodable character
322
+ literal. — *end note*]
323
+ - A *character-literal* with a *c-char-sequence* consisting of a single
324
+ *numeric-escape-sequence* has a value as follows:
325
+ - Let v be the integer value represented by the octal number
326
+ comprising the sequence of *octal-digit*s in an
327
+ *octal-escape-sequence* or by the hexadecimal number comprising the
328
+ sequence of *hexadecimal-digit*s in a *hexadecimal-escape-sequence*.
329
+ - If v does not exceed the range of representable values of the
330
+ *character-literal*’s type, then the value is v.
331
+ - Otherwise, if the *character-literal*’s *encoding-prefix* is absent
332
+ or `L`, and v does not exceed the range of representable values of
333
+ the corresponding unsigned type for the underlying type of the
334
+ *character-literal*’s type, then the value is the unique value of
335
+ the *character-literal*’s type `T` that is congruent to v modulo 2ᴺ,
336
+ where N is the width of `T`.
337
+ - Otherwise, the *character-literal* is ill-formed.
338
+ - A *character-literal* with a *c-char-sequence* consisting of a single
339
+ *conditional-escape-sequence* is conditionally-supported and has an
340
+ *implementation-defined* value.
341
+
342
+ The character specified by a *simple-escape-sequence* is specified in
343
+ [[lex.ccon.esc]].
344
+
345
+ [*Note 3*: Using an escape sequence for a question mark is supported
346
+ for compatibility with ISO C++14 and ISO C. — *end note*]
347
+
348
+ **Table: Simple escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
349
+
350
+ | character | | *simple-escape-sequence* |
351
+ | --------- | -------------------- | ------------------------ |
352
+ | `U+000a` | line feed | `\n` |
353
+ | `U+0009` | character tabulation | `\t` |
354
+ | `U+000b` | line tabulation | `\v` |
355
+ | `U+0008` | backspace | `\b` |
356
+ | `U+000d` | carriage return | `\r` |
357
+ | `U+000c` | form feed | `\f` |
358
+ | `U+0007` | alert | `\a` |
359
+ | `U+005c` | reverse solidus | `` |
360
+ | `U+003f` | question mark | `\?` |
361
+ | `U+0027` | apostrophe | `\'` |
362
+ | `U+0022` | quotation mark | `\"` |
363
 
 
 
 
 
 
 
364
 
365
  ### Floating-point literals <a id="lex.fcon">[[lex.fcon]]</a>
366
 
367
  ``` bnf
368
  floating-point-literal:
 
417
  digit-sequence '''ₒₚₜ digit
418
  ```
419
 
420
  ``` bnf
421
  floating-point-suffix: one of
422
+ 'f l f16 f32 f64 f128 bf16 F L F16 F32 F64 F128 BF16'
423
  ```
424
 
425
+ The type of a *floating-point-literal*
426
+ [[basic.fundamental]], [[basic.extended.fp]] is determined by its
427
  *floating-point-suffix* as specified in [[lex.fcon.type]].
428
 
429
+ [*Note 1*: The floating-point suffixes `f16`, `f32`, `f64`, `f128`,
430
+ `bf16`, `F16`, `F32`, `F64`, `F128`, and `BF16` are
431
+ conditionally-supported. See [[basic.extended.fp]]. — *end note*]
432
+
433
  **Table: Types of *floating-point-literal*{s}** <a id="lex.fcon.type">[lex.fcon.type]</a>
434
 
435
  | *floating-point-suffix* | type |
436
+ | ----------------------- | ----------------- |
437
  | none | `double` |
438
  | `f` or `F` | `float` |
439
  | `l` or `L` | `long` `double` |
440
+ | `f16` or `F16` | `std::float16_t` |
441
+ | `f32` or `F32` | `std::float32_t` |
442
+ | `f64` or `F64` | `std::float64_t` |
443
+ | `f128` or `F128` | `std::float128_t` |
444
+ | `bf16` or `BF16` | `std::bfloat16_t` |
445
 
446
 
447
  The *significand* of a *floating-point-literal* is the
448
  *fractional-constant* or *digit-sequence* of a
449
  *decimal-floating-point-literal* or the
 
452
  of *digit*s or *hexadecimal-digit*s and optional period are interpreted
453
  as a base N real number s, where N is 10 for a
454
  *decimal-floating-point-literal* and 16 for a
455
  *hexadecimal-floating-point-literal*.
456
 
457
+ [*Note 2*: Any optional separating single quotes are ignored when
458
  determining the value. — *end note*]
459
 
460
  If an *exponent-part* or *binary-exponent-part* is present, the exponent
461
  e of the *floating-point-literal* is the result of interpreting the
462
  sequence of an optional *sign* and the *digit*s as a base 10 integer.
 
488
  s-char-sequence s-char
489
  ```
490
 
491
  ``` bnf
492
  s-char:
493
+ basic-s-char
494
  escape-sequence
495
  universal-character-name
496
  ```
497
 
498
+ ``` bnf
499
+ basic-s-char:
500
+ any member of the translation character set except the U+0022 (quotation mark),
501
+ U+005c (reverse solidus), or new-line character
502
+ ```
503
+
504
  ``` bnf
505
  raw-string:
506
  '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
507
  ```
508
 
 
512
  r-char-sequence r-char
513
  ```
514
 
515
  ``` bnf
516
  r-char:
517
+ any member of the translation character set, except a U+0029 (right parenthesis) followed by
518
+ the initial *d-char-sequence* (which may be empty) followed by a U+0022 (quotation mark)
519
  ```
520
 
521
  ``` bnf
522
  d-char-sequence:
523
  d-char
524
  d-char-sequence d-char
525
  ```
526
 
527
  ``` bnf
528
  d-char:
529
+ any member of the basic character set except:
530
+ U+0020 (space), U+0028 (left parenthesis), U+0029 (right parenthesis), U+005c (reverse solidus),
531
+ U+0009 (character tabulation), U+000b (line tabulation), U+000c (form feed), and new-line
532
  ```
533
 
534
+ The kind of a *string-literal*, its type, and its associated character
535
+ encoding [[lex.charset]] are determined by its encoding prefix and
536
+ sequence of *s-char*s or *r-char*s as defined by [[lex.string.literal]]
537
+ where n is the number of encoded code units as described below.
538
+
539
+ **Table: String literals** <a id="lex.string.literal">[lex.string.literal]</a>
540
+
541
+ | | | | | |
542
+ | ---- | ----------------------- | ----------------------------- | ------------------------- | ---------------------------------------------- |
543
+ | none | ordinary string literal | array of $n$ `const char` | ordinary literal encoding | `"ordinary string"` `R"(ordinary raw string)"` |
544
+ | `L` | wide string literal | array of $n$ `const wchar_t` | wide literal encoding | `L"wide string"` `LR"w(wide raw string)w"` |
545
+ | `u8` | UTF-8 string literal | array of $n$ `const char8_t` | UTF-8 | `u8"UTF-8 string"` `u8R"x(UTF-8 raw string)x"` |
546
+ | `u` | UTF-16 string literal | array of $n$ `const char16_t` | UTF-16 | `u"UTF-16 string"` `uR"y(UTF-16 raw string)y"` |
547
+ | `U` | UTF-32 string literal | array of $n$ `const char32_t` | UTF-32 | `U"UTF-32 string"` `UR"z(UTF-32 raw string)z"` |
548
+
549
+
550
  A *string-literal* that has an `R` in the prefix is a *raw string
551
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
552
  *d-char-sequence* of a *raw-string* is the same sequence of characters
553
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
554
  at most 16 characters.
 
591
 
592
  is equivalent to `"x = \"\\\"y\\\"\""`.
593
 
594
  — *end example*]
595
 
 
 
 
 
 
 
 
 
 
 
 
 
 
596
  Ordinary string literals and UTF-8 string literals are also referred to
597
  as narrow string literals.
598
 
599
+ The common *encoding-prefix* for a sequence of adjacent
600
+ *string-literal*s is determined pairwise as follows: If two
601
+ *string-literal*s have the same *encoding-prefix*, the common
602
+ *encoding-prefix* is that *encoding-prefix*. If one *string-literal* has
603
+ no *encoding-prefix*, the common *encoding-prefix* is that of the other
604
+ *string-literal*. Any other combinations are ill-formed.
605
 
606
+ [*Note 3*: A *string-literal*’s rawness has no effect on the
607
+ determination of the common *encoding-prefix*. *end note*]
 
 
 
 
 
 
 
 
 
 
 
 
 
608
 
609
  In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
610
+ concatenated. The lexical structure and grouping of the contents of the
611
+ individual *string-literal*s is retained.
612
+
613
+ [*Example 2*:
614
+
615
+ ``` cpp
616
+ "\xA" "B"
617
+ ```
618
+
619
+ represents the code unit `'\xA'` and the character `'B'` after
620
+ concatenation (and not the single code unit `'\xAB'`). Similarly,
621
+
622
+ ``` cpp
623
+ R"(\u00)" "41"
624
+ ```
625
+
626
+ represents six characters, starting with a backslash and ending with the
627
+ digit `1` (and not the single character `'A'` specified by a
628
+ *universal-character-name*).
629
 
630
  [[lex.string.concat]] has some examples of valid concatenations.
631
 
632
+ — *end example*]
633
+
634
  **Table: String literal concatenations** <a id="lex.string.concat">[lex.string.concat]</a>
635
 
636
  | | | | | | |
637
  | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
638
  | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
639
  | `u"a"` | `u"b"` | `u"ab"` | `U"a"` | `U"b"` | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
640
  | `u"a"` | `"b"` | `u"ab"` | `U"a"` | `"b"` | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
641
  | `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
642
 
643
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
644
  Evaluating a *string-literal* results in a string literal object with
645
+ static storage duration [[basic.stc]]. Whether all *string-literal*s are
646
+ distinct (that is, are stored in nonoverlapping objects) and whether
647
+ successive evaluations of a *string-literal* yield the same or a
648
+ different object is unspecified.
649
+
650
+ [*Note 4*: The effect of attempting to modify a string literal object
651
+ is undefined. *end note*]
652
+
653
+ String literal objects are initialized with the sequence of code unit
654
+ values corresponding to the *string-literal*’s sequence of *s-char*s
655
+ (originally from non-raw string literals) and *r-char*s (originally from
656
+ raw string literals), plus a terminating U+0000 (null) character, in
657
+ order as follows:
658
+
659
+ - The sequence of characters denoted by each contiguous sequence of
660
+ *basic-s-char*s, *r-char*s, *simple-escape-sequence*s [[lex.ccon]],
661
+ and *universal-character-name*s [[lex.charset]] is encoded to a code
662
+ unit sequence using the *string-literal*’s associated character
663
+ encoding. If a character lacks representation in the associated
664
+ character encoding, then the *string-literal* is
665
+ conditionally-supported and an *implementation-defined* code unit
666
+ sequence is encoded. \[*Note 5*: No character lacks representation in
667
+ any Unicode encoding form. — *end note*] When encoding a stateful
668
+ character encoding, implementations should encode the first such
669
+ sequence beginning with the initial encoding state and encode
670
+ subsequent sequences beginning with the final encoding state of the
671
+ prior sequence. \[*Note 6*: The encoded code unit sequence can differ
672
+ from the sequence of code units that would be obtained by encoding
673
+ each character independently. — *end note*]
674
+ - Each *numeric-escape-sequence* [[lex.ccon]] contributes a single code
675
+ unit with a value as follows:
676
+ - Let v be the integer value represented by the octal number
677
+ comprising the sequence of *octal-digit*s in an
678
+ *octal-escape-sequence* or by the hexadecimal number comprising the
679
+ sequence of *hexadecimal-digit*s in a *hexadecimal-escape-sequence*.
680
+ - If v does not exceed the range of representable values of the
681
+ *string-literal*’s array element type, then the value is v.
682
+ - Otherwise, if the *string-literal*’s *encoding-prefix* is absent or
683
+ `L`, and v does not exceed the range of representable values of the
684
+ corresponding unsigned type for the underlying type of the
685
+ *string-literal*’s array element type, then the value is the unique
686
+ value of the *string-literal*’s array element type `T` that is
687
+ congruent to v modulo 2ᴺ, where N is the width of `T`.
688
+ - Otherwise, the *string-literal* is ill-formed.
689
+
690
+ When encoding a stateful character encoding, these sequences should
691
+ have no effect on encoding state.
692
+ - Each *conditional-escape-sequence* [[lex.ccon]] contributes an
693
+ *implementation-defined* code unit sequence. When encoding a stateful
694
+ character encoding, it is *implementation-defined* what effect these
695
+ sequences have on encoding state.
696
 
697
  ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
698
 
699
  ``` bnf
700
  boolean-literal:
701
  'false'
702
  'true'
703
  ```
704
 
705
  The Boolean literals are the keywords `false` and `true`. Such literals
706
+ have type `bool`.
707
 
708
  ### Pointer literals <a id="lex.nullptr">[[lex.nullptr]]</a>
709
 
710
  ``` bnf
711
  pointer-literal:
712
  'nullptr'
713
  ```
714
 
715
+ The pointer literal is the keyword `nullptr`. It has type
716
  `std::nullptr_t`.
717
 
718
  [*Note 1*: `std::nullptr_t` is a distinct type that is neither a
719
  pointer type nor a pointer-to-member type; rather, a prvalue of this
720
  type is a null pointer constant and can be converted to a null pointer
 
778
  that could match that non-terminal.
779
 
780
  A *user-defined-literal* is treated as a call to a literal operator or
781
  literal operator template [[over.literal]]. To determine the form of
782
  this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
783
+ first let *S* be the set of declarations found by unqualified lookup for
784
+ the *literal-operator-id* whose literal suffix identifier is *X*
785
+ [[basic.lookup.unqual]]. *S* shall not be empty.
 
786
 
787
  If *L* is a *user-defined-integer-literal*, let *n* be the literal
788
  without its *ud-suffix*. If *S* contains a literal operator with
789
  parameter type `unsigned long long`, the literal *L* is treated as a
790
  call of the form
 
796
  Otherwise, *S* shall contain a raw literal operator or a numeric literal
797
  operator template [[over.literal]] but not both. If *S* contains a raw
798
  literal operator, the literal *L* is treated as a call of the form
799
 
800
  ``` cpp
801
+ operator ""X("n")
802
  ```
803
 
804
  Otherwise (*S* contains a numeric literal operator template), *L* is
805
  treated as a call of the form
806
 
 
809
  ```
810
 
811
  where *n* is the source character sequence c₁c₂...cₖ.
812
 
813
  [*Note 1*: The sequence c₁c₂...cₖ can only contain characters from the
814
+ basic character set. — *end note*]
815
 
816
  If *L* is a *user-defined-floating-point-literal*, let *f* be the
817
  literal without its *ud-suffix*. If *S* contains a literal operator with
818
  parameter type `long double`, the literal *L* is treated as a call of
819
  the form
 
825
  Otherwise, *S* shall contain a raw literal operator or a numeric literal
826
  operator template [[over.literal]] but not both. If *S* contains a raw
827
  literal operator, the *literal* *L* is treated as a call of the form
828
 
829
  ``` cpp
830
+ operator ""X("f")
831
  ```
832
 
833
  Otherwise (*S* contains a numeric literal operator template), *L* is
834
  treated as a call of the form
835
 
 
838
  ```
839
 
840
  where *f* is the source character sequence c₁c₂...cₖ.
841
 
842
  [*Note 2*: The sequence c₁c₂...cₖ can only contain characters from the
843
+ basic character set. — *end note*]
844
 
845
  If *L* is a *user-defined-string-literal*, let *str* be the literal
846
  without its *ud-suffix* and let *len* be the number of code units in
847
  *str* (i.e., its length excluding the terminating null character). If
848
  *S* contains a literal operator template with a non-type template
 
896
 
897
  [*Example 3*:
898
 
899
  ``` cpp
900
  int main() {
901
+ L"A" "B" "C"_x; // OK, same as L"ABC"_x
902
  "P"_x "Q" "R"_y; // error: two different ud-suffix{es}
903
  }
904
  ```
905
 
906
  — *end example*]
907
 
908
  <!-- Link reference definitions -->
909
+ [basic.extended.fp]: basic.md#basic.extended.fp
910
  [basic.fundamental]: basic.md#basic.fundamental
911
  [basic.link]: basic.md#basic.link
912
  [basic.lookup.unqual]: basic.md#basic.lookup.unqual
913
  [basic.stc]: basic.md#basic.stc
914
+ [character.seq]: library.md#character.seq
915
  [conv.mem]: expr.md#conv.mem
916
  [conv.ptr]: expr.md#conv.ptr
917
  [cpp]: cpp.md#cpp
 
918
  [cpp.cond]: cpp.md#cpp.cond
919
  [cpp.import]: cpp.md#cpp.import
920
  [cpp.include]: cpp.md#cpp.include
921
  [cpp.module]: cpp.md#cpp.module
922
  [cpp.stringize]: cpp.md#cpp.stringize
923
  [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
924
+ [expr.prim.literal]: expr.md#expr.prim.literal
925
  [headers]: library.md#headers
926
  [lex]: #lex
927
  [lex.bool]: #lex.bool
928
  [lex.ccon]: #lex.ccon
929
  [lex.ccon.esc]: #lex.ccon.esc
930
+ [lex.ccon.literal]: #lex.ccon.literal
931
  [lex.charset]: #lex.charset
932
+ [lex.charset.basic]: #lex.charset.basic
933
+ [lex.charset.literal]: #lex.charset.literal
934
  [lex.comment]: #lex.comment
935
  [lex.digraph]: #lex.digraph
936
  [lex.ext]: #lex.ext
937
  [lex.fcon]: #lex.fcon
938
  [lex.fcon.type]: #lex.fcon.type
 
943
  [lex.key]: #lex.key
944
  [lex.key.digraph]: #lex.key.digraph
945
  [lex.literal]: #lex.literal
946
  [lex.literal.kinds]: #lex.literal.kinds
947
  [lex.name]: #lex.name
 
 
948
  [lex.name.special]: #lex.name.special
949
  [lex.nullptr]: #lex.nullptr
950
  [lex.operators]: #lex.operators
951
  [lex.phases]: #lex.phases
952
  [lex.ppnumber]: #lex.ppnumber
953
  [lex.pptoken]: #lex.pptoken
954
  [lex.separate]: #lex.separate
955
  [lex.string]: #lex.string
956
  [lex.string.concat]: #lex.string.concat
957
+ [lex.string.literal]: #lex.string.literal
958
  [lex.token]: #lex.token
959
  [module.import]: module.md#module.import
960
  [module.unit]: module.md#module.unit
961
  [over.literal]: over.md#over.literal
962
+ [support.types.layout]: support.md#support.types.layout
963
  [temp.explicit]: temp.md#temp.explicit
964
  [temp.names]: temp.md#temp.names
965
 
966
+ [^1]: Implementations behave as if these separate phases occur, although
967
+ in practice different phases can be folded together.
968
 
969
  [^2]: A partial preprocessing token would arise from a source file
970
  ending in the first portion of a multi-character token that requires
971
  a terminating sequence of characters, such as a *header-name* that
972
  is missing the closing `"` or `>`. A partial comment would arise
973
  from a source file ending with an unclosed `/*` comment.
974
 
975
+ [^3]: These include “digraphs” and additional reserved words. The term
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
976
  “digraph” (token consisting of two characters) is not perfectly
977
  descriptive, since one of the alternative *preprocessing-token*s is
978
  `%:%:` and of course several primary tokens contain two characters.
979
  Nonetheless, those alternative tokens that aren’t lexical keywords
980
  are colloquially known as “digraphs”.
981
 
982
+ [^4]: Thus the “stringized” values [[cpp.stringize]] of `[` and `<:`
983
  will be different, maintaining the source spelling, but the tokens
984
  can otherwise be freely interchanged.
985
 
986
+ [^5]: Literals include strings and character and numeric literals.
987
 
988
+ [^6]: Thus, a sequence of characters that resembles an escape sequence
989
+ can result in an error, be interpreted as the character
990
  corresponding to the escape sequence, or have a completely different
991
  meaning, depending on the implementation.
992
 
993
+ [^7]: On systems in which linkers cannot accept extended characters, an
994
+ encoding of the \*universal-character-name\* can be used in forming
995
  valid external identifiers. For example, some otherwise unused
996
+ character or sequence of characters can be used to encode the `̆` in
997
+ a \*universal-character-name\*. Extended characters can produce a
998
  long external identifier, but C++ does not place a translation limit
999
+ on significant characters for external identifiers.
 
 
1000
 
1001
+ [^8]: The term “literal” generally designates, in this document, those
1002
  tokens that are called “constants” in ISO C.