[text.encoding.class] - C++23 → Trunk

Files changed (1) hide show

tmp/tmp67e13wtf/{from.md → to.md} +565 -0

tmp/tmp67e13wtf/{from.md → to.md} RENAMED Viewed

	@@ -0,0 +1,565 @@

+### Class `text_encoding` <a id="text.encoding.class">[[text.encoding.class]]</a>
+#### Overview <a id="text.encoding.overview">[[text.encoding.overview]]</a>
+The class `text_encoding` describes an interface for accessing the IANA
+Character Sets registry.
+``` cpp
+namespace std {
+  struct text_encoding {
+    static constexpr size_t max_name_length = 63;
+    // [text.encoding.id], enumeration text_encoding::id
+    enum class id : int_least32_t {
+      see below
+    };
+    using enum id;
+    constexpr text_encoding() = default;
+    constexpr explicit text_encoding(string_view enc) noexcept;
+    constexpr text_encoding(id i) noexcept;
+    constexpr id mib() const noexcept;
+    constexpr const char* name() const noexcept;
+    // [text.encoding.aliases], class text_encoding::aliases_view
+    struct aliases_view;
+    constexpr aliases_view aliases() const noexcept;
+    friend constexpr bool operator==(const text_encoding& a,
+                                     const text_encoding& b) noexcept;
+    friend constexpr bool operator==(const text_encoding& encoding, id i) noexcept;
+    static consteval text_encoding literal() noexcept;
+    static text_encoding environment();
+    template<id i> static bool environment_is();
+  private:
+    id mib_ = id::unknown;                                              // exposition only
+    char name_[max_name_length + 1] = {0};                              // exposition only
+    static constexpr bool comp-name(string_view a, string_view b);      // exposition only
+  };
+}
+```
+Class `text_encoding` is a trivially copyable type
+[[term.trivially.copyable.type]].
+#### General <a id="text.encoding.general">[[text.encoding.general]]</a>
+A *registered character encoding* is a character encoding scheme in the
+IANA Character Sets registry.
+[*Note 1*: The IANA Character Sets registry uses the term “character
+sets” to refer to character encodings. — *end note*]
+The primary name of a registered character encoding is the name of that
+encoding specified in the IANA Character Sets registry.
+The set of known registered character encodings contains every
+registered character encoding specified in the IANA Character Sets
+registry except for the following:
+- NATS-DANO (33)
+- NATS-DANO-ADD (34)
+Each known registered character encoding is identified by an enumerator
+in `text_encoding::id`, and has a set of zero or more *aliases*.
+The set of aliases of a known registered character encoding is an
+*implementation-defined* superset of the aliases specified in the IANA
+Character Sets registry. The set of aliases for US-ASCII includes
+“ASCII”. No two aliases or primary names of distinct registered
+character encodings are equivalent when compared by
+`text_encoding::comp-name`.
+How a `text_encoding` object is determined to be representative of a
+character encoding scheme implemented in the translation or execution
+environment is *implementation-defined*.
+An object `e` of type `text_encoding` such that
+`e.mib() == text_encoding::id::unknown` is `false` and
+`e.mib() == text_encoding::id::other` is `false` maintains the following
+invariants:
+- `*e.name() == '\0'` is `false`, and
+- `e.mib() == text_encoding(e.name()).mib()` is `true`.
+*Recommended practice:*
+- Implementations should not consider registered encodings to be
+  interchangeable. \[*Example 1*: Shift_JIS and Windows-31J denote
+  different encodings. — *end example*]
+- Implementations should not use the name of a registered encoding to
+  describe another similar yet different non-registered encoding unless
+  there is a precedent on that implementation.
+  \[*Example 2*: Big5 — *end example*]
+#### Members <a id="text.encoding.members">[[text.encoding.members]]</a>
+``` cpp
+constexpr explicit text_encoding(string_view enc) noexcept;
+```
+*Preconditions:*
+- `enc` represents a string in the ordinary literal encoding consisting
+  only of elements of the basic character set [[lex.charset]].
+- `enc.size() <= max_name_length` is `true`.
+- `enc.contains(’\0’)` is `false`.
+*Ensures:*
+- If there exists a primary name or alias `a` of a known registered
+  character encoding such that *`comp-name`*`(a, enc)` is `true`,
+  *mib\_* has the value of the enumerator of `id` associated with that
+  registered character encoding. Otherwise, *`mib_`*` == id::other` is
+  `true`.
+- `enc.compare(`*`name_`*`) == 0` is `true`.
+``` cpp
+constexpr text_encoding(id i) noexcept;
+```
+*Preconditions:* `i` has the value of one of the enumerators of `id`.
+*Ensures:*
+- *`mib_`*` == i` is `true`.
+- If `(`*`mib_`*` == id::unknown || `*`mib_`*` == id::other)` is `true`,
+  `strlen(`*`name_`*`) == 0` is `true`. Otherwise,
+  `ranges::contains(aliases(), string_view(`*`name_`*`))` is `true`.
+``` cpp
+constexpr id mib() const noexcept;
+```
+*Returns:* *mib\_*.
+``` cpp
+constexpr const char* name() const noexcept;
+```
+*Returns:* *name\_*.
+*Remarks:* `name()` is an NTBS and accessing elements of *name\_*
+outside of the range `name()`+\[0, `strlen(name()) + 1`) is undefined
+behavior.
+``` cpp
+constexpr aliases_view aliases() const noexcept;
+```
+Let `r` denote an instance of `aliases_view`. If `*this` represents a
+known registered character encoding, then:
+- `r.front()` is the primary name of the registered character encoding,
+- `r` contains the aliases of the registered character encoding, and
+- `r` does not contain duplicate values when compared with `strcmp`.
+Otherwise, `r` is an empty range.
+Each element in `r` is a non-null, non-empty NTBS encoded in the literal
+character encoding and comprising only characters from the basic
+character set.
+*Returns:* `r`.
+[*Note 1*: The order of aliases in `r` is unspecified. — *end note*]
+``` cpp
+static consteval text_encoding literal() noexcept;
+```
+*Mandates:* `CHAR_BIT == 8` is `true`.
+*Returns:* A `text_encoding` object representing the ordinary character
+literal encoding [[lex.charset]].
+``` cpp
+static text_encoding environment();
+```
+*Mandates:* `CHAR_BIT == 8` is `true`.
+*Returns:* A `text_encoding` object representing the
+*implementation-defined* character encoding scheme of the environment.
+On a POSIX implementation, this is the encoding scheme associated with
+the POSIX locale denoted by the empty string `""`.
+[*Note 2*: This function is not affected by calls to
+`setlocale`. — *end note*]
+*Recommended practice:* Implementations should return a value that is
+not affected by calls to the POSIX function `setenv` and other functions
+which can modify the environment [[support.runtime]].
+``` cpp
+template<id i>
+  static bool environment_is();
+```
+*Mandates:* `CHAR_BIT == 8` is `true`.
+*Returns:* `environment() == i`.
+``` cpp
+static constexpr bool comp-name(string_view a, string_view b);
+```
+*Returns:* `true` if the two strings `a` and `b` encoded in the ordinary
+literal encoding are equal, ignoring, from left-to-right,
+- all elements that are not digits or letters [[character.seq.general]],
+- character case, and
+- any sequence of one or more `0` characters not immediately preceded by
+  a numeric prefix, where a numeric prefix is a sequence consisting of a
+  digit in the range \[`1`, `9`\] optionally followed by one or more
+  elements which are not digits or letters,
+and `false` otherwise.
+[*Note 3*: This comparison is identical to the “Charset Alias Matching”
+algorithm described in the Unicode Technical Standard 22. — *end note*]
+[*Example 1*:
+``` cpp
+static_assert(comp-name("UTF-8", "utf8") == true);
+static_assert(comp-name("u.t.f-008", "utf8") == true);
+static_assert(comp-name("ut8", "utf8") == false);
+static_assert(comp-name("utf-80", "utf8") == false);
+```
+— *end example*]
+#### Comparison functions <a id="text.encoding.cmp">[[text.encoding.cmp]]</a>
+``` cpp
+friend constexpr bool operator==(const text_encoding& a, const text_encoding& b) noexcept;
+```
+*Returns:* If `a.`*`mib_`*` == id::other && b.`*`mib_`*` == id::other`
+is `true`, then *`comp-name`*`(a.`*`name_`*`,b.`*`name_`*`)`. Otherwise,
+`a.`*`mib_`*` == b.`*`mib_`*.
+``` cpp
+friend constexpr bool operator==(const text_encoding& encoding, id i) noexcept;
+```
+*Returns:* `encoding.`*`mib_`*` == i`.
+*Remarks:* This operator induces an equivalence relation on its
+arguments if and only if `i != id::other` is `true`.
+#### Class `text_encoding::aliases_view` <a id="text.encoding.aliases">[[text.encoding.aliases]]</a>
+``` cpp
+struct text_encoding::aliases_view : ranges::view_interface<text_encoding::aliases_view> {
+  constexpr implementation-defined  // type of text_encoding::aliases_view::begin() begin() const;
+  constexpr implementation-defined  // type of text_encoding::aliases_view::end() end() const;
+};
+```
+`text_encoding::aliases_view` models `copyable`, `ranges::view`,
+`ranges::random_access_range`, and `ranges::borrowed_range`.
+[*Note 1*: `text_encoding::aliases_view` is not required to satisfy
+`ranges::``common_range`, nor `default_initializable`. — *end note*]
+Both `ranges::range_value_t<text_encoding::aliases_view>` and
+`ranges::range_reference_t<text_encoding::aliases_view>` denote
+`const char*`.
+`ranges::iterator_t<text_encoding::aliases_view>` is a constexpr
+iterator [[iterator.requirements.general]].
+#### Enumeration `text_encoding::id` <a id="text.encoding.id">[[text.encoding.id]]</a>
+``` cpp
+namespace std {
+  enum class text_encoding::id : int_least32_t {
+    other = 1,
+    unknown = 2,
+    ASCII = 3,
+    ISOLatin1 = 4,
+    ISOLatin2 = 5,
+    ISOLatin3 = 6,
+    ISOLatin4 = 7,
+    ISOLatinCyrillic = 8,
+    ISOLatinArabic = 9,
+    ISOLatinGreek = 10,
+    ISOLatinHebrew = 11,
+    ISOLatin5 = 12,
+    ISOLatin6 = 13,
+    ISOTextComm = 14,
+    HalfWidthKatakana = 15,
+    JISEncoding = 16,
+    ShiftJIS = 17,
+    EUCPkdFmtJapanese = 18,
+    EUCFixWidJapanese = 19,
+    ISO4UnitedKingdom = 20,
+    ISO11SwedishForNames = 21,
+    ISO15Italian = 22,
+    ISO17Spanish = 23,
+    ISO21German = 24,
+    ISO60DanishNorwegian = 25,
+    ISO69French = 26,
+    ISO10646UTF1 = 27,
+    ISO646basic1983 = 28,
+    INVARIANT = 29,
+    ISO2IntlRefVersion = 30,
+    NATSSEFI = 31,
+    NATSSEFIADD = 32,
+    ISO10Swedish = 35,
+    KSC56011987 = 36,
+    ISO2022KR = 37,
+    EUCKR = 38,
+    ISO2022JP = 39,
+    ISO2022JP2 = 40,
+    ISO13JISC6220jp = 41,
+    ISO14JISC6220ro = 42,
+    ISO16Portuguese = 43,
+    ISO18Greek7Old = 44,
+    ISO19LatinGreek = 45,
+    ISO25French = 46,
+    ISO27LatinGreek1 = 47,
+    ISO5427Cyrillic = 48,
+    ISO42JISC62261978 = 49,
+    ISO47BSViewdata = 50,
+    ISO49INIS = 51,
+    ISO50INIS8 = 52,
+    ISO51INISCyrillic = 53,
+    ISO54271981 = 54,
+    ISO5428Greek = 55,
+    ISO57GB1988 = 56,
+    ISO58GB231280 = 57,
+    ISO61Norwegian2 = 58,
+    ISO70VideotexSupp1 = 59,
+    ISO84Portuguese2 = 60,
+    ISO85Spanish2 = 61,
+    ISO86Hungarian = 62,
+    ISO87JISX0208 = 63,
+    ISO88Greek7 = 64,
+    ISO89ASMO449 = 65,
+    ISO90 = 66,
+    ISO91JISC62291984a = 67,
+    ISO92JISC62991984b = 68,
+    ISO93JIS62291984badd = 69,
+    ISO94JIS62291984hand = 70,
+    ISO95JIS62291984handadd = 71,
+    ISO96JISC62291984kana = 72,
+    ISO2033 = 73,
+    ISO99NAPLPS = 74,
+    ISO102T617bit = 75,
+    ISO103T618bit = 76,
+    ISO111ECMACyrillic = 77,
+    ISO121Canadian1 = 78,
+    ISO122Canadian2 = 79,
+    ISO123CSAZ24341985gr = 80,
+    ISO88596E = 81,
+    ISO88596I = 82,
+    ISO128T101G2 = 83,
+    ISO88598E = 84,
+    ISO88598I = 85,
+    ISO139CSN369103 = 86,
+    ISO141JUSIB1002 = 87,
+    ISO143IECP271 = 88,
+    ISO146Serbian = 89,
+    ISO147Macedonian = 90,
+    ISO150 = 91,
+    ISO151Cuba = 92,
+    ISO6937Add = 93,
+    ISO153GOST1976874 = 94,
+    ISO8859Supp = 95,
+    ISO10367Box = 96,
+    ISO158Lap = 97,
+    ISO159JISX02121990 = 98,
+    ISO646Danish = 99,
+    USDK = 100,
+    DKUS = 101,
+    KSC5636 = 102,
+    Unicode11UTF7 = 103,
+    ISO2022CN = 104,
+    ISO2022CNEXT = 105,
+    UTF8 = 106,
+    ISO885913 = 109,
+    ISO885914 = 110,
+    ISO885915 = 111,
+    ISO885916 = 112,
+    GBK = 113,
+    GB18030 = 114,
+    OSDEBCDICDF0415 = 115,
+    OSDEBCDICDF03IRV = 116,
+    OSDEBCDICDF041 = 117,
+    ISO115481 = 118,
+    KZ1048 = 119,
+    UCS2 = 1000,
+    UCS4 = 1001,
+    UnicodeASCII = 1002,
+    UnicodeLatin1 = 1003,
+    UnicodeJapanese = 1004,
+    UnicodeIBM1261 = 1005,
+    UnicodeIBM1268 = 1006,
+    UnicodeIBM1276 = 1007,
+    UnicodeIBM1264 = 1008,
+    UnicodeIBM1265 = 1009,
+    Unicode11 = 1010,
+    SCSU = 1011,
+    UTF7 = 1012,
+    UTF16BE = 1013,
+    UTF16LE = 1014,
+    UTF16 = 1015,
+    CESU8 = 1016,
+    UTF32 = 1017,
+    UTF32BE = 1018,
+    UTF32LE = 1019,
+    BOCU1 = 1020,
+    UTF7IMAP = 1021,
+    Windows30Latin1 = 2000,
+    Windows31Latin1 = 2001,
+    Windows31Latin2 = 2002,
+    Windows31Latin5 = 2003,
+    HPRoman8 = 2004,
+    AdobeStandardEncoding = 2005,
+    VenturaUS = 2006,
+    VenturaInternational = 2007,
+    DECMCS = 2008,
+    PC850Multilingual = 2009,
+    PCp852 = 2010,
+    PC8CodePage437 = 2011,
+    PC8DanishNorwegian = 2012,
+    PC862LatinHebrew = 2013,
+    PC8Turkish = 2014,
+    IBMSymbols = 2015,
+    IBMThai = 2016,
+    HPLegal = 2017,
+    HPPiFont = 2018,
+    HPMath8 = 2019,
+    HPPSMath = 2020,
+    HPDesktop = 2021,
+    VenturaMath = 2022,
+    MicrosoftPublishing = 2023,
+    Windows31J = 2024,
+    GB2312 = 2025,
+    Big5 = 2026,
+    Macintosh = 2027,
+    IBM037 = 2028,
+    IBM038 = 2029,
+    IBM273 = 2030,
+    IBM274 = 2031,
+    IBM275 = 2032,
+    IBM277 = 2033,
+    IBM278 = 2034,
+    IBM280 = 2035,
+    IBM281 = 2036,
+    IBM284 = 2037,
+    IBM285 = 2038,
+    IBM290 = 2039,
+    IBM297 = 2040,
+    IBM420 = 2041,
+    IBM423 = 2042,
+    IBM424 = 2043,
+    IBM500 = 2044,
+    IBM851 = 2045,
+    IBM855 = 2046,
+    IBM857 = 2047,
+    IBM860 = 2048,
+    IBM861 = 2049,
+    IBM863 = 2050,
+    IBM864 = 2051,
+    IBM865 = 2052,
+    IBM868 = 2053,
+    IBM869 = 2054,
+    IBM870 = 2055,
+    IBM871 = 2056,
+    IBM880 = 2057,
+    IBM891 = 2058,
+    IBM903 = 2059,
+    IBM904 = 2060,
+    IBM905 = 2061,
+    IBM918 = 2062,
+    IBM1026 = 2063,
+    IBMEBCDICATDE = 2064,
+    EBCDICATDEA = 2065,
+    EBCDICCAFR = 2066,
+    EBCDICDKNO = 2067,
+    EBCDICDKNOA = 2068,
+    EBCDICFISE = 2069,
+    EBCDICFISEA = 2070,
+    EBCDICFR = 2071,
+    EBCDICIT = 2072,
+    EBCDICPT = 2073,
+    EBCDICES = 2074,
+    EBCDICESA = 2075,
+    EBCDICESS = 2076,
+    EBCDICUK = 2077,
+    EBCDICUS = 2078,
+    Unknown8BiT = 2079,
+    Mnemonic = 2080,
+    Mnem = 2081,
+    VISCII = 2082,
+    VIQR = 2083,
+    KOI8R = 2084,
+    HZGB2312 = 2085,
+    IBM866 = 2086,
+    PC775Baltic = 2087,
+    KOI8U = 2088,
+    IBM00858 = 2089,
+    IBM00924 = 2090,
+    IBM01140 = 2091,
+    IBM01141 = 2092,
+    IBM01142 = 2093,
+    IBM01143 = 2094,
+    IBM01144 = 2095,
+    IBM01145 = 2096,
+    IBM01146 = 2097,
+    IBM01147 = 2098,
+    IBM01148 = 2099,
+    IBM01149 = 2100,
+    Big5HKSCS = 2101,
+    IBM1047 = 2102,
+    PTCP154 = 2103,
+    Amiga1251 = 2104,
+    KOI7switched = 2105,
+    BRF = 2106,
+    TSCII = 2107,
+    CP51932 = 2108,
+    windows874 = 2109,
+    windows1250 = 2250,
+    windows1251 = 2251,
+    windows1252 = 2252,
+    windows1253 = 2253,
+    windows1254 = 2254,
+    windows1255 = 2255,
+    windows1256 = 2256,
+    windows1257 = 2257,
+    windows1258 = 2258,
+    TIS620 = 2259,
+    CP50220 = 2260
+  };
+}
+```
+[*Note 1*:
+The `text_encoding::id` enumeration contains an enumerator for each
+known registered character encoding. For each encoding, the
+corresponding enumerator is derived from the alias beginning with
+“`cs`”, as follows
+- `csUnicode` is mapped to `text_encoding::id::UCS2`,
+- `csIBBM904` is mapped to `text_encoding::id::IBM904`, and
+- the “`cs`” prefix is removed from other names.
+— *end note*]
+#### Hash support <a id="text.encoding.hash">[[text.encoding.hash]]</a>
+``` cpp
+template<> struct hash<text_encoding>;
+```
+The specialization is enabled [[unord.hash]].

Diff to HTML by rtfpessoa