From Jason Turner

[basic.extended.fp]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpse3gulrz/{from.md → to.md} +23 -25
tmp/tmpse3gulrz/{from.md → to.md} RENAMED
@@ -1,65 +1,63 @@
1
  ### Optional extended floating-point types <a id="basic.extended.fp">[[basic.extended.fp]]</a>
2
 
3
  If the implementation supports an extended floating-point type
4
- [[basic.fundamental]] whose properties are specified by the ISO/IEC/IEEE
5
  60559 floating-point interchange format binary16, then the
6
- *typedef-name* `std::float16_t` is defined in the header `<stdfloat>`
7
  and names such a type, the macro `__STDCPP_FLOAT16_T__` is defined
8
  [[cpp.predefined]], and the floating-point literal suffixes `f16` and
9
  `F16` are supported [[lex.fcon]].
10
 
11
  If the implementation supports an extended floating-point type whose
12
- properties are specified by the ISO/IEC/IEEE 60559 floating-point
13
- interchange format binary32, then the *typedef-name* `std::float32_t` is
14
- defined in the header `<stdfloat>` and names such a type, the macro
15
  `__STDCPP_FLOAT32_T__` is defined, and the floating-point literal
16
  suffixes `f32` and `F32` are supported.
17
 
18
  If the implementation supports an extended floating-point type whose
19
- properties are specified by the ISO/IEC/IEEE 60559 floating-point
20
- interchange format binary64, then the *typedef-name* `std::float64_t` is
21
- defined in the header `<stdfloat>` and names such a type, the macro
22
  `__STDCPP_FLOAT64_T__` is defined, and the floating-point literal
23
  suffixes `f64` and `F64` are supported.
24
 
25
  If the implementation supports an extended floating-point type whose
26
- properties are specified by the ISO/IEC/IEEE 60559 floating-point
27
- interchange format binary128, then the *typedef-name* `std::float128_t`
28
- is defined in the header `<stdfloat>` and names such a type, the macro
29
  `__STDCPP_FLOAT128_T__` is defined, and the floating-point literal
30
  suffixes `f128` and `F128` are supported.
31
 
32
  If the implementation supports an extended floating-point type with the
33
- properties, as specified by ISO/IEC/IEEE 60559, of radix (b) of 2,
34
- storage width in bits (k) of 16, precision in bits (p) of 8, maximum
35
- exponent (emax) of 127, and exponent field width in bits (w) of 8, then
36
- the *typedef-name* `std::bfloat16_t` is defined in the header
37
- `<stdfloat>` and names such a type, the macro `__STDCPP_BFLOAT16_T__` is
38
- defined, and the floating-point literal suffixes `bf16` and `BF16` are
39
- supported.
40
 
41
  [*Note 1*: A summary of the parameters for each type is given in
42
  [[basic.extended.fp]]. The precision p includes the implicit 1 bit at
43
- the beginning of the mantissa, so the storage used for the mantissa is
44
- p-1 bits. ISO/IEC/IEEE 60559 does not assign a name for a type having
45
- the parameters specified for `std::bfloat16_t`. — *end note*]
46
 
47
  **Table: Properties of named extended floating-point types** <a id="basic.extended.fp">[basic.extended.fp]</a>
48
 
49
  | Parameter | `float16_t` | `float32_t` | `float64_t` | `float128_t` | `bfloat16_t` |
50
  | --------------------------------- | ----------- | ----------- | ----------- | ------------ | ------------ |
51
- | ISO/IEC/IEEE 60559 name | binary16 | binary32 | binary64 | binary128 | |
52
  | $k$, storage width in bits | 16 | 32 | 64 | 128 | 16 |
53
  | $p$, precision in bits | 11 | 24 | 53 | 113 | 8 |
54
  | $emax$, maximum exponent | 15 | 127 | 1023 | 16383 | 127 |
55
  | $w$, exponent field width in bits | 5 | 8 | 11 | 15 | 8 |
56
 
57
 
58
  *Recommended practice:* Any names that the implementation provides for
59
  the extended floating-point types described in this subsection that are
60
- in addition to the names defined in the `<stdfloat>` header should be
61
  chosen to increase compatibility and interoperability with the
62
  interchange types `_Float16`, `_Float32`, `_Float64`, and `_Float128`
63
- defined in ISO/IEC TS 18661-3 and with future versions of the C
64
- standard.
65
 
 
1
  ### Optional extended floating-point types <a id="basic.extended.fp">[[basic.extended.fp]]</a>
2
 
3
  If the implementation supports an extended floating-point type
4
+ [[basic.fundamental]] whose properties are specified by the ISO/IEC
5
  60559 floating-point interchange format binary16, then the
6
+ *typedef-name* `std::float16_t` is declared in the header `<stdfloat>`
7
  and names such a type, the macro `__STDCPP_FLOAT16_T__` is defined
8
  [[cpp.predefined]], and the floating-point literal suffixes `f16` and
9
  `F16` are supported [[lex.fcon]].
10
 
11
  If the implementation supports an extended floating-point type whose
12
+ properties are specified by the ISO/IEC 60559 floating-point interchange
13
+ format binary32, then the *typedef-name* `std::float32_t` is declared in
14
+ the header `<stdfloat>` and names such a type, the macro
15
  `__STDCPP_FLOAT32_T__` is defined, and the floating-point literal
16
  suffixes `f32` and `F32` are supported.
17
 
18
  If the implementation supports an extended floating-point type whose
19
+ properties are specified by the ISO/IEC 60559 floating-point interchange
20
+ format binary64, then the *typedef-name* `std::float64_t` is declared in
21
+ the header `<stdfloat>` and names such a type, the macro
22
  `__STDCPP_FLOAT64_T__` is defined, and the floating-point literal
23
  suffixes `f64` and `F64` are supported.
24
 
25
  If the implementation supports an extended floating-point type whose
26
+ properties are specified by the ISO/IEC 60559 floating-point interchange
27
+ format binary128, then the *typedef-name* `std::float128_t` is declared
28
+ in the header `<stdfloat>` and names such a type, the macro
29
  `__STDCPP_FLOAT128_T__` is defined, and the floating-point literal
30
  suffixes `f128` and `F128` are supported.
31
 
32
  If the implementation supports an extended floating-point type with the
33
+ properties, as specified by ISO/IEC 60559, of radix (b) of 2, storage
34
+ width in bits (k) of 16, precision in bits (p) of 8, maximum exponent
35
+ (emax) of 127, and exponent field width in bits (w) of 8, then the
36
+ *typedef-name* `std::bfloat16_t` is declared in the header `<stdfloat>`
37
+ and names such a type, the macro `__STDCPP_BFLOAT16_T__` is defined, and
38
+ the floating-point literal suffixes `bf16` and `BF16` are supported.
 
39
 
40
  [*Note 1*: A summary of the parameters for each type is given in
41
  [[basic.extended.fp]]. The precision p includes the implicit 1 bit at
42
+ the beginning of the significand, so the storage used for the
43
+ significand is p-1 bits. ISO/IEC 60559 does not assign a name for a type
44
+ having the parameters specified for `std::bfloat16_t`. — *end note*]
45
 
46
  **Table: Properties of named extended floating-point types** <a id="basic.extended.fp">[basic.extended.fp]</a>
47
 
48
  | Parameter | `float16_t` | `float32_t` | `float64_t` | `float128_t` | `bfloat16_t` |
49
  | --------------------------------- | ----------- | ----------- | ----------- | ------------ | ------------ |
50
+ | ISO/IEC 60559 name | binary16 | binary32 | binary64 | binary128 | |
51
  | $k$, storage width in bits | 16 | 32 | 64 | 128 | 16 |
52
  | $p$, precision in bits | 11 | 24 | 53 | 113 | 8 |
53
  | $emax$, maximum exponent | 15 | 127 | 1023 | 16383 | 127 |
54
  | $w$, exponent field width in bits | 5 | 8 | 11 | 15 | 8 |
55
 
56
 
57
  *Recommended practice:* Any names that the implementation provides for
58
  the extended floating-point types described in this subsection that are
59
+ in addition to the names declared in the `<stdfloat>` header should be
60
  chosen to increase compatibility and interoperability with the
61
  interchange types `_Float16`, `_Float32`, `_Float64`, and `_Float128`
62
+ defined in ISO/IEC TS 18661-3 and with future versions of \IsoCUndated.
 
63