From Jason Turner

[lex.literal]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpwm5zlpj2/{from.md → to.md} +111 -93
tmp/tmpwm5zlpj2/{from.md → to.md} RENAMED
@@ -115,12 +115,12 @@ size-suffix: one of
115
  'z Z'
116
  ```
117
 
118
  In an *integer-literal*, the sequence of *binary-digit*s,
119
  *octal-digit*s, *digit*s, or *hexadecimal-digit*s is interpreted as a
120
- base N integer as shown in table [[lex.icon.base]]; the lexically first
121
- digit of the sequence of digits is the most significant.
122
 
123
  [*Note 1*: The prefix and any optional separating single quotes are
124
  ignored when determining the value. — *end note*]
125
 
126
  **Table: Base of *integer-literal*{s}** <a id="lex.icon.base">[lex.icon.base]</a>
@@ -173,20 +173,23 @@ which its value can be represented.
173
  | | | `std::size_t` |
174
  | Both `u` or `U` | `std::size_t` | `std::size_t` |
175
  | and `z` or `Z` | | |
176
 
177
 
178
- If an *integer-literal* cannot be represented by any type in its list
 
179
  and an extended integer type [[basic.fundamental]] can represent its
180
  value, it may have that extended integer type. If all of the types in
181
  the list for the *integer-literal* are signed, the extended integer type
182
- shall be signed. If all of the types in the list for the
183
- *integer-literal* are unsigned, the extended integer type shall be
184
- unsigned. If the list contains both signed and unsigned types, the
185
- extended integer type may be signed or unsigned. A program is ill-formed
186
- if one of its translation units contains an *integer-literal* that
187
- cannot be represented by any of the allowed types.
 
 
188
 
189
  ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
190
 
191
  ``` bnf
192
  character-literal:
@@ -198,12 +201,11 @@ encoding-prefix: one of
198
  'u8' 'u' 'U' 'L'
199
  ```
200
 
201
  ``` bnf
202
  c-char-sequence:
203
- c-char
204
- c-char-sequence c-char
205
  ```
206
 
207
  ``` bnf
208
  c-char:
209
  basic-c-char
@@ -240,12 +242,11 @@ numeric-escape-sequence:
240
  hexadecimal-escape-sequence
241
  ```
242
 
243
  ``` bnf
244
  simple-octal-digit-sequence:
245
- octal-digit
246
- simple-octal-digit-sequence octal-digit
247
  ```
248
 
249
  ``` bnf
250
  octal-escape-sequence:
251
  '\' octal-digit
@@ -268,60 +269,47 @@ conditional-escape-sequence:
268
  ``` bnf
269
  conditional-escape-sequence-char:
270
  any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters 'N', 'o', 'u', 'U', or 'x'
271
  ```
272
 
273
- A *non-encodable character literal* is a *character-literal* whose
274
- *c-char-sequence* consists of a single *c-char* that is not a
275
- *numeric-escape-sequence* and that specifies a character that either
276
- lacks representation in the literal’s associated character encoding or
277
- that cannot be encoded as a single code unit. A *multicharacter literal*
278
- is a *character-literal* whose *c-char-sequence* consists of more than
279
- one *c-char*. The *encoding-prefix* of a non-encodable character literal
280
- or a multicharacter literal shall be absent. Such *character-literal*s
281
- are conditionally-supported.
282
 
283
  The kind of a *character-literal*, its type, and its associated
284
  character encoding [[lex.charset]] are determined by its
285
  *encoding-prefix* and its *c-char-sequence* as defined by
286
- [[lex.ccon.literal]]. The special cases for non-encodable character
287
- literals and multicharacter literals take precedence over the base kind.
288
-
289
- [*Note 1*: The associated character encoding for ordinary character
290
- literals determines encodability, but does not determine the value of
291
- non-encodable ordinary character literals or ordinary multicharacter
292
- literals. The examples in [[lex.ccon.literal]] for non-encodable
293
- ordinary character literals assume that the specified character lacks
294
- representation in the ordinary literal encoding or that encoding the
295
- character would require more than one code unit. — *end note*]
296
 
297
  **Table: Character literals** <a id="lex.ccon.literal">[lex.ccon.literal]</a>
298
 
299
- | | | | | |
300
- | ---- | -------------------------- | ---------- | ------------ | ------- |
301
- | none | ordinary character literal | `char` | ordinary | `'v'` |
302
  | `L` | wide character literal | `wchar_t` | wide literal | `L'w'` |
303
  | | | | encoding | |
304
  | `u8` | UTF-8 character literal | `char8_t` | UTF-8 | `u8'x'` |
305
  | `u` | UTF-16 character literal | `char16_t` | UTF-16 | `u'y'` |
306
  | `U` | UTF-32 character literal | `char32_t` | UTF-32 | `U'z'` |
307
 
308
 
309
  In translation phase 4, the value of a *character-literal* is determined
310
  using the range of representable values of the *character-literal*’s
311
- type in translation phase 7. A non-encodable character literal or a
312
- multicharacter literal has an *implementation-defined* value. The value
313
- of any other kind of *character-literal* is determined as follows:
314
 
315
  - A *character-literal* with a *c-char-sequence* consisting of a single
316
  *basic-c-char*, *simple-escape-sequence*, or
317
  *universal-character-name* is the code unit value of the specified
318
  character as encoded in the literal’s associated character encoding.
319
- \[*Note 2*: If the specified character lacks representation in the
320
- literal’s associated character encoding or if it cannot be encoded as
321
- a single code unit, then the literal is a non-encodable character
322
- literal. — *end note*]
323
  - A *character-literal* with a *c-char-sequence* consisting of a single
324
  *numeric-escape-sequence* has a value as follows:
325
  - Let v be the integer value represented by the octal number
326
  comprising the sequence of *octal-digit*s in an
327
  *octal-escape-sequence* or by the hexadecimal number comprising the
@@ -332,20 +320,20 @@ of any other kind of *character-literal* is determined as follows:
332
  or `L`, and v does not exceed the range of representable values of
333
  the corresponding unsigned type for the underlying type of the
334
  *character-literal*’s type, then the value is the unique value of
335
  the *character-literal*’s type `T` that is congruent to v modulo 2ᴺ,
336
  where N is the width of `T`.
337
- - Otherwise, the *character-literal* is ill-formed.
338
  - A *character-literal* with a *c-char-sequence* consisting of a single
339
  *conditional-escape-sequence* is conditionally-supported and has an
340
  *implementation-defined* value.
341
 
342
  The character specified by a *simple-escape-sequence* is specified in
343
  [[lex.ccon.esc]].
344
 
345
- [*Note 3*: Using an escape sequence for a question mark is supported
346
- for compatibility with ISO C++14 and ISO C. — *end note*]
347
 
348
  **Table: Simple escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
349
 
350
  | character | | *simple-escape-sequence* |
351
  | --------- | -------------------- | ------------------------ |
@@ -482,12 +470,11 @@ string-literal:
482
  encoding-prefixₒₚₜ 'R' raw-string
483
  ```
484
 
485
  ``` bnf
486
  s-char-sequence:
487
- s-char
488
- s-char-sequence s-char
489
  ```
490
 
491
  ``` bnf
492
  s-char:
493
  basic-s-char
@@ -506,24 +493,22 @@ raw-string:
506
  '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
507
  ```
508
 
509
  ``` bnf
510
  r-char-sequence:
511
- r-char
512
- r-char-sequence r-char
513
  ```
514
 
515
  ``` bnf
516
  r-char:
517
  any member of the translation character set, except a U+0029 (right parenthesis) followed by
518
  the initial *d-char-sequence* (which may be empty) followed by a U+0022 (quotation mark)
519
  ```
520
 
521
  ``` bnf
522
  d-char-sequence:
523
- d-char
524
- d-char-sequence d-char
525
  ```
526
 
527
  ``` bnf
528
  d-char:
529
  any member of the basic character set except:
@@ -532,16 +517,17 @@ d-char:
532
  ```
533
 
534
  The kind of a *string-literal*, its type, and its associated character
535
  encoding [[lex.charset]] are determined by its encoding prefix and
536
  sequence of *s-char*s or *r-char*s as defined by [[lex.string.literal]]
537
- where n is the number of encoded code units as described below.
 
538
 
539
  **Table: String literals** <a id="lex.string.literal">[lex.string.literal]</a>
540
 
541
- | | | | | |
542
- | ---- | ----------------------- | ----------------------------- | ------------------------- | ---------------------------------------------- |
543
  | none | ordinary string literal | array of $n$ `const char` | ordinary literal encoding | `"ordinary string"` `R"(ordinary raw string)"` |
544
  | `L` | wide string literal | array of $n$ `const wchar_t` | wide literal encoding | `L"wide string"` `LR"w(wide raw string)w"` |
545
  | `u8` | UTF-8 string literal | array of $n$ `const char8_t` | UTF-8 | `u8"UTF-8 string"` `u8R"x(UTF-8 raw string)x"` |
546
  | `u` | UTF-16 string literal | array of $n$ `const char16_t` | UTF-16 | `u"UTF-16 string"` `uR"y(UTF-16 raw string)y"` |
547
  | `U` | UTF-32 string literal | array of $n$ `const char32_t` | UTF-32 | `U"UTF-32 string"` `UR"z(UTF-32 raw string)z"` |
@@ -551,12 +537,12 @@ A *string-literal* that has an `R` in the prefix is a *raw string
551
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
552
  *d-char-sequence* of a *raw-string* is the same sequence of characters
553
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
554
  at most 16 characters.
555
 
556
- [*Note 1*: The characters `'('` and `')'` are permitted in a
557
- *raw-string*. Thus, `R"delimiter((a|b))delimiter"` is equivalent to
558
  `"(a|b)"`. — *end note*]
559
 
560
  [*Note 2*:
561
 
562
  A source-file new-line in a raw string literal results in a new-line in
@@ -592,18 +578,15 @@ R"(x = "\"y\"")"
592
  is equivalent to `"x = \"\\\"y\\\"\""`.
593
 
594
  — *end example*]
595
 
596
  Ordinary string literals and UTF-8 string literals are also referred to
597
- as narrow string literals.
598
 
599
- The common *encoding-prefix* for a sequence of adjacent
600
- *string-literal*s is determined pairwise as follows: If two
601
- *string-literal*s have the same *encoding-prefix*, the common
602
- *encoding-prefix* is that *encoding-prefix*. If one *string-literal* has
603
- no *encoding-prefix*, the common *encoding-prefix* is that of the other
604
- *string-literal*. Any other combinations are ill-formed.
605
 
606
  [*Note 3*: A *string-literal*’s rawness has no effect on the
607
  determination of the common *encoding-prefix*. — *end note*]
608
 
609
  In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
@@ -640,16 +623,17 @@ digit `1` (and not the single character `'A'` specified by a
640
  | `u"a"` | `"b"` | `u"ab"` | `U"a"` | `"b"` | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
641
  | `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
642
 
643
 
644
  Evaluating a *string-literal* results in a string literal object with
645
- static storage duration [[basic.stc]]. Whether all *string-literal*s are
646
- distinct (that is, are stored in nonoverlapping objects) and whether
647
- successive evaluations of a *string-literal* yield the same or a
648
- different object is unspecified.
649
 
650
- [*Note 4*: The effect of attempting to modify a string literal object
 
 
 
 
651
  is undefined. — *end note*]
652
 
653
  String literal objects are initialized with the sequence of code unit
654
  values corresponding to the *string-literal*’s sequence of *s-char*s
655
  (originally from non-raw string literals) and *r-char*s (originally from
@@ -659,20 +643,19 @@ order as follows:
659
  - The sequence of characters denoted by each contiguous sequence of
660
  *basic-s-char*s, *r-char*s, *simple-escape-sequence*s [[lex.ccon]],
661
  and *universal-character-name*s [[lex.charset]] is encoded to a code
662
  unit sequence using the *string-literal*’s associated character
663
  encoding. If a character lacks representation in the associated
664
- character encoding, then the *string-literal* is
665
- conditionally-supported and an *implementation-defined* code unit
666
- sequence is encoded. \[*Note 5*: No character lacks representation in
667
- any Unicode encoding form. *end note*] When encoding a stateful
668
- character encoding, implementations should encode the first such
669
- sequence beginning with the initial encoding state and encode
670
- subsequent sequences beginning with the final encoding state of the
671
- prior sequence. \[*Note 6*: The encoded code unit sequence can differ
672
- from the sequence of code units that would be obtained by encoding
673
- each character independently. — *end note*]
674
  - Each *numeric-escape-sequence* [[lex.ccon]] contributes a single code
675
  unit with a value as follows:
676
  - Let v be the integer value represented by the octal number
677
  comprising the sequence of *octal-digit*s in an
678
  *octal-escape-sequence* or by the hexadecimal number comprising the
@@ -683,35 +666,53 @@ order as follows:
683
  `L`, and v does not exceed the range of representable values of the
684
  corresponding unsigned type for the underlying type of the
685
  *string-literal*’s array element type, then the value is the unique
686
  value of the *string-literal*’s array element type `T` that is
687
  congruent to v modulo 2ᴺ, where N is the width of `T`.
688
- - Otherwise, the *string-literal* is ill-formed.
689
 
690
  When encoding a stateful character encoding, these sequences should
691
  have no effect on encoding state.
692
  - Each *conditional-escape-sequence* [[lex.ccon]] contributes an
693
  *implementation-defined* code unit sequence. When encoding a stateful
694
  character encoding, it is *implementation-defined* what effect these
695
  sequences have on encoding state.
696
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
697
  ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
698
 
699
  ``` bnf
700
  boolean-literal:
701
- 'false'
702
- 'true'
703
  ```
704
 
705
  The Boolean literals are the keywords `false` and `true`. Such literals
706
  have type `bool`.
707
 
708
  ### Pointer literals <a id="lex.nullptr">[[lex.nullptr]]</a>
709
 
710
  ``` bnf
711
  pointer-literal:
712
- 'nullptr'
713
  ```
714
 
715
  The pointer literal is the keyword `nullptr`. It has type
716
  `std::nullptr_t`.
717
 
@@ -843,11 +844,11 @@ where *f* is the source character sequence c₁c₂...cₖ.
843
  basic character set. — *end note*]
844
 
845
  If *L* is a *user-defined-string-literal*, let *str* be the literal
846
  without its *ud-suffix* and let *len* be the number of code units in
847
  *str* (i.e., its length excluding the terminating null character). If
848
- *S* contains a literal operator template with a non-type template
849
  parameter for which *str* is a well-formed *template-argument*, the
850
  literal *L* is treated as a call of the form
851
 
852
  ``` cpp
853
  operator ""X<str>()
@@ -910,26 +911,37 @@ int main() {
910
  [basic.fundamental]: basic.md#basic.fundamental
911
  [basic.link]: basic.md#basic.link
912
  [basic.lookup.unqual]: basic.md#basic.lookup.unqual
913
  [basic.stc]: basic.md#basic.stc
914
  [character.seq]: library.md#character.seq
 
915
  [conv.mem]: expr.md#conv.mem
916
  [conv.ptr]: expr.md#conv.ptr
917
  [cpp]: cpp.md#cpp
918
  [cpp.cond]: cpp.md#cpp.cond
 
919
  [cpp.import]: cpp.md#cpp.import
920
  [cpp.include]: cpp.md#cpp.include
921
  [cpp.module]: cpp.md#cpp.module
 
 
 
 
 
922
  [cpp.stringize]: cpp.md#cpp.stringize
923
  [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
 
 
924
  [expr.prim.literal]: expr.md#expr.prim.literal
925
  [headers]: library.md#headers
 
926
  [lex]: #lex
927
  [lex.bool]: #lex.bool
928
  [lex.ccon]: #lex.ccon
929
  [lex.ccon.esc]: #lex.ccon.esc
930
  [lex.ccon.literal]: #lex.ccon.literal
 
931
  [lex.charset]: #lex.charset
932
  [lex.charset.basic]: #lex.charset.basic
933
  [lex.charset.literal]: #lex.charset.literal
934
  [lex.comment]: #lex.comment
935
  [lex.digraph]: #lex.digraph
@@ -953,50 +965,56 @@ int main() {
953
  [lex.pptoken]: #lex.pptoken
954
  [lex.separate]: #lex.separate
955
  [lex.string]: #lex.string
956
  [lex.string.concat]: #lex.string.concat
957
  [lex.string.literal]: #lex.string.literal
 
958
  [lex.token]: #lex.token
 
959
  [module.import]: module.md#module.import
 
960
  [module.unit]: module.md#module.unit
961
  [over.literal]: over.md#over.literal
962
  [support.types.layout]: support.md#support.types.layout
963
  [temp.explicit]: temp.md#temp.explicit
 
964
  [temp.names]: temp.md#temp.names
 
 
965
 
966
  [^1]: Implementations behave as if these separate phases occur, although
967
  in practice different phases can be folded together.
968
 
969
- [^2]: A partial preprocessing token would arise from a source file
 
 
 
 
 
970
  ending in the first portion of a multi-character token that requires
971
  a terminating sequence of characters, such as a *header-name* that
972
  is missing the closing `"` or `>`. A partial comment would arise
973
  from a source file ending with an unclosed `/*` comment.
974
 
975
- [^3]: These include “digraphs” and additional reserved words. The term
976
  “digraph” (token consisting of two characters) is not perfectly
977
  descriptive, since one of the alternative *preprocessing-token*s is
978
  `%:%:` and of course several primary tokens contain two characters.
979
  Nonetheless, those alternative tokens that aren’t lexical keywords
980
  are colloquially known as “digraphs”.
981
 
982
- [^4]: Thus the “stringized” values [[cpp.stringize]] of `[` and `<:`
983
  will be different, maintaining the source spelling, but the tokens
984
  can otherwise be freely interchanged.
985
 
986
- [^5]: Literals include strings and character and numeric literals.
987
-
988
- [^6]: Thus, a sequence of characters that resembles an escape sequence
989
- can result in an error, be interpreted as the character
990
- corresponding to the escape sequence, or have a completely different
991
- meaning, depending on the implementation.
992
 
993
  [^7]: On systems in which linkers cannot accept extended characters, an
994
  encoding of the \*universal-character-name\* can be used in forming
995
  valid external identifiers. For example, some otherwise unused
996
  character or sequence of characters can be used to encode the `̆` in
997
  a \*universal-character-name\*. Extended characters can produce a
998
  long external identifier, but C++ does not place a translation limit
999
  on significant characters for external identifiers.
1000
 
1001
  [^8]: The term “literal” generally designates, in this document, those
1002
- tokens that are called “constants” in ISO C.
 
115
  'z Z'
116
  ```
117
 
118
  In an *integer-literal*, the sequence of *binary-digit*s,
119
  *octal-digit*s, *digit*s, or *hexadecimal-digit*s is interpreted as a
120
+ base N integer as shown in [[lex.icon.base]]; the lexically first digit
121
+ of the sequence of digits is the most significant.
122
 
123
  [*Note 1*: The prefix and any optional separating single quotes are
124
  ignored when determining the value. — *end note*]
125
 
126
  **Table: Base of *integer-literal*{s}** <a id="lex.icon.base">[lex.icon.base]</a>
 
173
  | | | `std::size_t` |
174
  | Both `u` or `U` | `std::size_t` | `std::size_t` |
175
  | and `z` or `Z` | | |
176
 
177
 
178
+ Except for *integer-literal*s containing a *size-suffix*, if the value
179
+ of an *integer-literal* cannot be represented by any type in its list
180
  and an extended integer type [[basic.fundamental]] can represent its
181
  value, it may have that extended integer type. If all of the types in
182
  the list for the *integer-literal* are signed, the extended integer type
183
+ is signed. If all of the types in the list for the *integer-literal* are
184
+ unsigned, the extended integer type is unsigned. If the list contains
185
+ both signed and unsigned types, the extended integer type may be signed
186
+ or unsigned. If an *integer-literal* cannot be represented by any of the
187
+ allowed types, the program is ill-formed.
188
+
189
+ [*Note 2*: An *integer-literal* with a `z` or `Z` suffix is ill-formed
190
+ if it cannot be represented by `std::size_t`. — *end note*]
191
 
192
  ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
193
 
194
  ``` bnf
195
  character-literal:
 
201
  'u8' 'u' 'U' 'L'
202
  ```
203
 
204
  ``` bnf
205
  c-char-sequence:
206
+ c-char c-char-sequenceₒₚₜ
 
207
  ```
208
 
209
  ``` bnf
210
  c-char:
211
  basic-c-char
 
242
  hexadecimal-escape-sequence
243
  ```
244
 
245
  ``` bnf
246
  simple-octal-digit-sequence:
247
+ octal-digit simple-octal-digit-sequenceₒₚₜ
 
248
  ```
249
 
250
  ``` bnf
251
  octal-escape-sequence:
252
  '\' octal-digit
 
269
  ``` bnf
270
  conditional-escape-sequence-char:
271
  any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters 'N', 'o', 'u', 'U', or 'x'
272
  ```
273
 
274
+ A *multicharacter literal* is a *character-literal* whose
275
+ *c-char-sequence* consists of more than one *c-char*. A multicharacter
276
+ literal shall not have an *encoding-prefix*. If a multicharacter literal
277
+ contains a *c-char* that is not encodable as a single code unit in the
278
+ ordinary literal encoding, the program is ill-formed. Multicharacter
279
+ literals are conditionally-supported.
 
 
 
280
 
281
  The kind of a *character-literal*, its type, and its associated
282
  character encoding [[lex.charset]] are determined by its
283
  *encoding-prefix* and its *c-char-sequence* as defined by
284
+ [[lex.ccon.literal]].
 
 
 
 
 
 
 
 
 
285
 
286
  **Table: Character literals** <a id="lex.ccon.literal">[lex.ccon.literal]</a>
287
 
288
+ | Encoding prefix | Kind \chdr | Type \chdr | Associated char- acter encoding | Example |
289
+ | --------------- | -------------------------- | ---------- | ------------------------------- | ------- |
290
+ | none | ordinary character literal | `char` | ordinary literal | `'v'` |
291
  | `L` | wide character literal | `wchar_t` | wide literal | `L'w'` |
292
  | | | | encoding | |
293
  | `u8` | UTF-8 character literal | `char8_t` | UTF-8 | `u8'x'` |
294
  | `u` | UTF-16 character literal | `char16_t` | UTF-16 | `u'y'` |
295
  | `U` | UTF-32 character literal | `char32_t` | UTF-32 | `U'z'` |
296
 
297
 
298
  In translation phase 4, the value of a *character-literal* is determined
299
  using the range of representable values of the *character-literal*’s
300
+ type in translation phase 7. A multicharacter literal has an
301
+ *implementation-defined* value. The value of any other kind of
302
+ *character-literal* is determined as follows:
303
 
304
  - A *character-literal* with a *c-char-sequence* consisting of a single
305
  *basic-c-char*, *simple-escape-sequence*, or
306
  *universal-character-name* is the code unit value of the specified
307
  character as encoded in the literal’s associated character encoding.
308
+ If the specified character lacks representation in the literal’s
309
+ associated character encoding or if it cannot be encoded as a single
310
+ code unit, then the program is ill-formed.
 
311
  - A *character-literal* with a *c-char-sequence* consisting of a single
312
  *numeric-escape-sequence* has a value as follows:
313
  - Let v be the integer value represented by the octal number
314
  comprising the sequence of *octal-digit*s in an
315
  *octal-escape-sequence* or by the hexadecimal number comprising the
 
320
  or `L`, and v does not exceed the range of representable values of
321
  the corresponding unsigned type for the underlying type of the
322
  *character-literal*’s type, then the value is the unique value of
323
  the *character-literal*’s type `T` that is congruent to v modulo 2ᴺ,
324
  where N is the width of `T`.
325
+ - Otherwise, the program is ill-formed.
326
  - A *character-literal* with a *c-char-sequence* consisting of a single
327
  *conditional-escape-sequence* is conditionally-supported and has an
328
  *implementation-defined* value.
329
 
330
  The character specified by a *simple-escape-sequence* is specified in
331
  [[lex.ccon.esc]].
332
 
333
+ [*Note 1*: Using an escape sequence for a question mark is supported
334
+ for compatibility with C++14 and C. — *end note*]
335
 
336
  **Table: Simple escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
337
 
338
  | character | | *simple-escape-sequence* |
339
  | --------- | -------------------- | ------------------------ |
 
470
  encoding-prefixₒₚₜ 'R' raw-string
471
  ```
472
 
473
  ``` bnf
474
  s-char-sequence:
475
+ s-char s-char-sequenceₒₚₜ
 
476
  ```
477
 
478
  ``` bnf
479
  s-char:
480
  basic-s-char
 
493
  '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
494
  ```
495
 
496
  ``` bnf
497
  r-char-sequence:
498
+ r-char r-char-sequenceₒₚₜ
 
499
  ```
500
 
501
  ``` bnf
502
  r-char:
503
  any member of the translation character set, except a U+0029 (right parenthesis) followed by
504
  the initial *d-char-sequence* (which may be empty) followed by a U+0022 (quotation mark)
505
  ```
506
 
507
  ``` bnf
508
  d-char-sequence:
509
+ d-char d-char-sequenceₒₚₜ
 
510
  ```
511
 
512
  ``` bnf
513
  d-char:
514
  any member of the basic character set except:
 
517
  ```
518
 
519
  The kind of a *string-literal*, its type, and its associated character
520
  encoding [[lex.charset]] are determined by its encoding prefix and
521
  sequence of *s-char*s or *r-char*s as defined by [[lex.string.literal]]
522
+ where n is the number of encoded code units that would result from an
523
+ evaluation of the *string-literal* (see below).
524
 
525
  **Table: String literals** <a id="lex.string.literal">[lex.string.literal]</a>
526
 
527
+ | Enco- ding prefix | Kind \chdr \chdr | Type \chdr \chdr | Associated character encoding | Examples \rhdr \rhdr |
528
+ | ----------------- | ----------------------- | ----------------------------- | ----------------------------- | ---------------------------------------------- |
529
  | none | ordinary string literal | array of $n$ `const char` | ordinary literal encoding | `"ordinary string"` `R"(ordinary raw string)"` |
530
  | `L` | wide string literal | array of $n$ `const wchar_t` | wide literal encoding | `L"wide string"` `LR"w(wide raw string)w"` |
531
  | `u8` | UTF-8 string literal | array of $n$ `const char8_t` | UTF-8 | `u8"UTF-8 string"` `u8R"x(UTF-8 raw string)x"` |
532
  | `u` | UTF-16 string literal | array of $n$ `const char16_t` | UTF-16 | `u"UTF-16 string"` `uR"y(UTF-16 raw string)y"` |
533
  | `U` | UTF-32 string literal | array of $n$ `const char32_t` | UTF-32 | `U"UTF-32 string"` `UR"z(UTF-32 raw string)z"` |
 
537
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
538
  *d-char-sequence* of a *raw-string* is the same sequence of characters
539
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
540
  at most 16 characters.
541
 
542
+ [*Note 1*: The characters `'('` and `')'` can appear in a *raw-string*.
543
+ Thus, `R"delimiter((a|b))delimiter"` is equivalent to
544
  `"(a|b)"`. — *end note*]
545
 
546
  [*Note 2*:
547
 
548
  A source-file new-line in a raw string literal results in a new-line in
 
578
  is equivalent to `"x = \"\\\"y\\\"\""`.
579
 
580
  — *end example*]
581
 
582
  Ordinary string literals and UTF-8 string literals are also referred to
583
+ as *narrow string literals*.
584
 
585
+ The *string-literal*s in any sequence of adjacent *string-literal*s
586
+ shall have at most one unique *encoding-prefix* among them. The common
587
+ *encoding-prefix* of the sequence is that *encoding-prefix*, if any.
 
 
 
588
 
589
  [*Note 3*: A *string-literal*’s rawness has no effect on the
590
  determination of the common *encoding-prefix*. — *end note*]
591
 
592
  In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
 
623
  | `u"a"` | `"b"` | `u"ab"` | `U"a"` | `"b"` | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
624
  | `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
625
 
626
 
627
  Evaluating a *string-literal* results in a string literal object with
628
+ static storage duration [[basic.stc]].
 
 
 
629
 
630
+ [*Note 4*: String literal objects are potentially non-unique
631
+ [[intro.object]]. Whether successive evaluations of a *string-literal*
632
+ yield the same or a different object is unspecified. — *end note*]
633
+
634
+ [*Note 5*: The effect of attempting to modify a string literal object
635
  is undefined. — *end note*]
636
 
637
  String literal objects are initialized with the sequence of code unit
638
  values corresponding to the *string-literal*’s sequence of *s-char*s
639
  (originally from non-raw string literals) and *r-char*s (originally from
 
643
  - The sequence of characters denoted by each contiguous sequence of
644
  *basic-s-char*s, *r-char*s, *simple-escape-sequence*s [[lex.ccon]],
645
  and *universal-character-name*s [[lex.charset]] is encoded to a code
646
  unit sequence using the *string-literal*’s associated character
647
  encoding. If a character lacks representation in the associated
648
+ character encoding, then the program is ill-formed. \[*Note 6*: No
649
+ character lacks representation in any Unicode encoding
650
+ form. *end note*] When encoding a stateful character encoding,
651
+ implementations should encode the first such sequence beginning with
652
+ the initial encoding state and encode subsequent sequences beginning
653
+ with the final encoding state of the prior sequence. \[*Note 7*: The
654
+ encoded code unit sequence can differ from the sequence of code units
655
+ that would be obtained by encoding each character
656
+ independently. *end note*]
 
657
  - Each *numeric-escape-sequence* [[lex.ccon]] contributes a single code
658
  unit with a value as follows:
659
  - Let v be the integer value represented by the octal number
660
  comprising the sequence of *octal-digit*s in an
661
  *octal-escape-sequence* or by the hexadecimal number comprising the
 
666
  `L`, and v does not exceed the range of representable values of the
667
  corresponding unsigned type for the underlying type of the
668
  *string-literal*’s array element type, then the value is the unique
669
  value of the *string-literal*’s array element type `T` that is
670
  congruent to v modulo 2ᴺ, where N is the width of `T`.
671
+ - Otherwise, the program is ill-formed.
672
 
673
  When encoding a stateful character encoding, these sequences should
674
  have no effect on encoding state.
675
  - Each *conditional-escape-sequence* [[lex.ccon]] contributes an
676
  *implementation-defined* code unit sequence. When encoding a stateful
677
  character encoding, it is *implementation-defined* what effect these
678
  sequences have on encoding state.
679
 
680
+ ### Unevaluated strings <a id="lex.string.uneval">[[lex.string.uneval]]</a>
681
+
682
+ ``` bnf
683
+ unevaluated-string:
684
+ string-literal
685
+ ```
686
+
687
+ An *unevaluated-string* shall have no *encoding-prefix*.
688
+
689
+ Each *universal-character-name* and each *simple-escape-sequence* in an
690
+ *unevaluated-string* is replaced by the member of the translation
691
+ character set it denotes. An *unevaluated-string* that contains a
692
+ *numeric-escape-sequence* or a *conditional-escape-sequence* is
693
+ ill-formed.
694
+
695
+ An *unevaluated-string* is never evaluated and its interpretation
696
+ depends on the context in which it appears.
697
+
698
  ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
699
 
700
  ``` bnf
701
  boolean-literal:
702
+ false
703
+ true
704
  ```
705
 
706
  The Boolean literals are the keywords `false` and `true`. Such literals
707
  have type `bool`.
708
 
709
  ### Pointer literals <a id="lex.nullptr">[[lex.nullptr]]</a>
710
 
711
  ``` bnf
712
  pointer-literal:
713
+ nullptr
714
  ```
715
 
716
  The pointer literal is the keyword `nullptr`. It has type
717
  `std::nullptr_t`.
718
 
 
844
  basic character set. — *end note*]
845
 
846
  If *L* is a *user-defined-string-literal*, let *str* be the literal
847
  without its *ud-suffix* and let *len* be the number of code units in
848
  *str* (i.e., its length excluding the terminating null character). If
849
+ *S* contains a literal operator template with a constant template
850
  parameter for which *str* is a well-formed *template-argument*, the
851
  literal *L* is treated as a call of the form
852
 
853
  ``` cpp
854
  operator ""X<str>()
 
911
  [basic.fundamental]: basic.md#basic.fundamental
912
  [basic.link]: basic.md#basic.link
913
  [basic.lookup.unqual]: basic.md#basic.lookup.unqual
914
  [basic.stc]: basic.md#basic.stc
915
  [character.seq]: library.md#character.seq
916
+ [class.mem.general]: class.md#class.mem.general
917
  [conv.mem]: expr.md#conv.mem
918
  [conv.ptr]: expr.md#conv.ptr
919
  [cpp]: cpp.md#cpp
920
  [cpp.cond]: cpp.md#cpp.cond
921
+ [cpp.embed]: cpp.md#cpp.embed
922
  [cpp.import]: cpp.md#cpp.import
923
  [cpp.include]: cpp.md#cpp.include
924
  [cpp.module]: cpp.md#cpp.module
925
+ [cpp.pragma]: cpp.md#cpp.pragma
926
+ [cpp.pragma.op]: cpp.md#cpp.pragma.op
927
+ [cpp.pre]: cpp.md#cpp.pre
928
+ [cpp.predefined]: cpp.md#cpp.predefined
929
+ [cpp.replace]: cpp.md#cpp.replace
930
  [cpp.stringize]: cpp.md#cpp.stringize
931
  [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
932
+ [dcl.pre]: dcl.md#dcl.pre
933
+ [expr.const]: expr.md#expr.const
934
  [expr.prim.literal]: expr.md#expr.prim.literal
935
  [headers]: library.md#headers
936
+ [intro.object]: basic.md#intro.object
937
  [lex]: #lex
938
  [lex.bool]: #lex.bool
939
  [lex.ccon]: #lex.ccon
940
  [lex.ccon.esc]: #lex.ccon.esc
941
  [lex.ccon.literal]: #lex.ccon.literal
942
+ [lex.char]: #lex.char
943
  [lex.charset]: #lex.charset
944
  [lex.charset.basic]: #lex.charset.basic
945
  [lex.charset.literal]: #lex.charset.literal
946
  [lex.comment]: #lex.comment
947
  [lex.digraph]: #lex.digraph
 
965
  [lex.pptoken]: #lex.pptoken
966
  [lex.separate]: #lex.separate
967
  [lex.string]: #lex.string
968
  [lex.string.concat]: #lex.string.concat
969
  [lex.string.literal]: #lex.string.literal
970
+ [lex.string.uneval]: #lex.string.uneval
971
  [lex.token]: #lex.token
972
+ [lex.universal.char]: #lex.universal.char
973
  [module.import]: module.md#module.import
974
+ [module.reach]: module.md#module.reach
975
  [module.unit]: module.md#module.unit
976
  [over.literal]: over.md#over.literal
977
  [support.types.layout]: support.md#support.types.layout
978
  [temp.explicit]: temp.md#temp.explicit
979
+ [temp.inst]: temp.md#temp.inst
980
  [temp.names]: temp.md#temp.names
981
+ [temp.point]: temp.md#temp.point
982
+ [uaxid]: uax31.md#uaxid
983
 
984
  [^1]: Implementations behave as if these separate phases occur, although
985
  in practice different phases can be folded together.
986
 
987
+ [^2]: Unicode® is a registered trademark of Unicode, Inc. This
988
+ information is given for the convenience of users of this document
989
+ and does not constitute an endorsement by ISO or IEC of this
990
+ product.
991
+
992
+ [^3]: A partial preprocessing token would arise from a source file
993
  ending in the first portion of a multi-character token that requires
994
  a terminating sequence of characters, such as a *header-name* that
995
  is missing the closing `"` or `>`. A partial comment would arise
996
  from a source file ending with an unclosed `/*` comment.
997
 
998
+ [^4]: These include “digraphs” and additional reserved words. The term
999
  “digraph” (token consisting of two characters) is not perfectly
1000
  descriptive, since one of the alternative *preprocessing-token*s is
1001
  `%:%:` and of course several primary tokens contain two characters.
1002
  Nonetheless, those alternative tokens that aren’t lexical keywords
1003
  are colloquially known as “digraphs”.
1004
 
1005
+ [^5]: Thus the “stringized” values [[cpp.stringize]] of `[` and `<:`
1006
  will be different, maintaining the source spelling, but the tokens
1007
  can otherwise be freely interchanged.
1008
 
1009
+ [^6]: Literals include strings and character and numeric literals.
 
 
 
 
 
1010
 
1011
  [^7]: On systems in which linkers cannot accept extended characters, an
1012
  encoding of the \*universal-character-name\* can be used in forming
1013
  valid external identifiers. For example, some otherwise unused
1014
  character or sequence of characters can be used to encode the `̆` in
1015
  a \*universal-character-name\*. Extended characters can produce a
1016
  long external identifier, but C++ does not place a translation limit
1017
  on significant characters for external identifiers.
1018
 
1019
  [^8]: The term “literal” generally designates, in this document, those
1020
+ tokens that are called “constants” in C.