- tmp/tmpwj5n5_fd/{from.md → to.md} +302 -138
tmp/tmpwj5n5_fd/{from.md → to.md}
RENAMED
|
@@ -1,65 +1,50 @@
|
|
| 1 |
## Multi-threaded executions and data races <a id="intro.multithread">[[intro.multithread]]</a>
|
| 2 |
|
| 3 |
A *thread of execution* (also known as a *thread*) is a single flow of
|
| 4 |
control within a program, including the initial invocation of a specific
|
| 5 |
top-level function, and recursively including every function invocation
|
| 6 |
-
subsequently executed by the thread.
|
| 7 |
-
the initial call to the top-level function of the new thread is executed
|
| 8 |
-
by the new thread, not by the creating thread. Every thread in a program
|
| 9 |
-
can potentially access every object and function in a program.[^10]
|
| 10 |
-
Under a hosted implementation, a C++program can have more than one
|
| 11 |
-
thread running concurrently. The execution of each thread proceeds as
|
| 12 |
-
defined by the remainder of this standard. The execution of the entire
|
| 13 |
-
program consists of an execution of all of its threads. Usually the
|
| 14 |
-
execution can be viewed as an interleaving of all its threads. However,
|
| 15 |
-
some kinds of atomic operations, for example, allow executions
|
| 16 |
-
inconsistent with a simple interleaving, as described below. Under a
|
| 17 |
-
freestanding implementation, it is *implementation-defined* whether a
|
| 18 |
-
program can have more than one thread of execution.
|
| 19 |
|
| 20 |
-
|
| 21 |
-
function
|
| 22 |
-
|
| 23 |
-
contains a signal handler invocation.
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
thread
|
| 29 |
-
|
|
|
|
| 30 |
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
|
|
|
| 34 |
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
inopportune interference from other threads may prevent forward
|
| 44 |
-
progress, e.g., by repeatedly stealing a cache line for unrelated
|
| 45 |
-
purposes between load-locked and store-conditional instructions.
|
| 46 |
-
Implementations should ensure that such effects cannot indefinitely
|
| 47 |
-
delay progress under expected operating conditions, and that such
|
| 48 |
-
anomalies can therefore safely be ignored by programmers. Outside this
|
| 49 |
-
International Standard, this property is sometimes termed lock-free.
|
| 50 |
|
| 51 |
The value of an object visible to a thread *T* at a particular point is
|
| 52 |
the initial value of the object, a value assigned to the object by *T*,
|
| 53 |
or a value assigned to the object by another thread, according to the
|
| 54 |
-
rules below.
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
Two expression evaluations *conflict* if one of them modifies a memory
|
| 60 |
-
location ([[intro.memory]]) and the other one
|
| 61 |
same memory location.
|
| 62 |
|
| 63 |
The library defines a number of atomic operations (Clause [[atomics]])
|
| 64 |
and operations on mutexes (Clause [[thread]]) that are specially
|
| 65 |
identified as synchronization operations. These operations play a
|
|
@@ -69,30 +54,30 @@ consume operation, an acquire operation, a release operation, or both an
|
|
| 69 |
acquire and release operation. A synchronization operation without an
|
| 70 |
associated memory location is a fence and can be either an acquire
|
| 71 |
fence, a release fence, or both an acquire and release fence. In
|
| 72 |
addition, there are relaxed atomic operations, which are not
|
| 73 |
synchronization operations, and atomic read-modify-write operations,
|
| 74 |
-
which have special characteristics.
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
|
|
|
|
|
|
| 78 |
release operation on *A* forces prior side effects on other memory
|
| 79 |
locations to become visible to other threads that later perform a
|
| 80 |
consume or an acquire operation on *A*. “Relaxed” atomic operations are
|
| 81 |
not synchronization operations even though, like synchronization
|
| 82 |
-
operations, they cannot contribute to data races.
|
| 83 |
|
| 84 |
All modifications to a particular atomic object *M* occur in some
|
| 85 |
-
particular total order, called the *modification order* of *M*.
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
general this will be impossible since different threads may observe
|
| 93 |
-
modifications to different objects in inconsistent orders.
|
| 94 |
|
| 95 |
A *release sequence* headed by a release operation *A* on an atomic
|
| 96 |
object *M* is a maximal contiguous sub-sequence of side effects in the
|
| 97 |
modification order of *M*, where the first operation is `A`, and every
|
| 98 |
subsequent operation
|
|
@@ -101,18 +86,21 @@ subsequent operation
|
|
| 101 |
- is an atomic read-modify-write operation.
|
| 102 |
|
| 103 |
Certain library calls *synchronize with* other library calls performed
|
| 104 |
by another thread. For example, an atomic store-release synchronizes
|
| 105 |
with a load-acquire that takes its value from the store (
|
| 106 |
-
[[atomics.order]]).
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
|
|
|
|
|
|
|
|
|
| 114 |
|
| 115 |
An evaluation *A* *carries a dependency* to an evaluation *B* if
|
| 116 |
|
| 117 |
- the value of *A* is used as an operand of *B*, unless:
|
| 118 |
- *B* is an invocation of any specialization of
|
|
@@ -129,25 +117,25 @@ An evaluation *A* *carries a dependency* to an evaluation *B* if
|
|
| 129 |
- *A* writes a scalar object or bit-field *M*, *B* reads the value
|
| 130 |
written by *A* from *M*, and *A* is sequenced before *B*, or
|
| 131 |
- for some evaluation *X*, *A* carries a dependency to *X*, and *X*
|
| 132 |
carries a dependency to *B*.
|
| 133 |
|
| 134 |
-
“Carries a dependency to” is a subset of “is sequenced
|
| 135 |
-
similarly strictly intra-thread.
|
| 136 |
|
| 137 |
An evaluation *A* is *dependency-ordered before* an evaluation *B* if
|
| 138 |
|
| 139 |
- *A* performs a release operation on an atomic object *M*, and, in
|
| 140 |
another thread, *B* performs a consume operation on *M* and reads a
|
| 141 |
value written by any side effect in the release sequence headed by
|
| 142 |
*A*, or
|
| 143 |
- for some evaluation *X*, *A* is dependency-ordered before *X* and *X*
|
| 144 |
carries a dependency to *B*.
|
| 145 |
|
| 146 |
-
The relation “is dependency-ordered before” is analogous to
|
| 147 |
“synchronizes with”, but uses release/consume in place of
|
| 148 |
-
release/acquire.
|
| 149 |
|
| 150 |
An evaluation *A* *inter-thread happens before* an evaluation *B* if
|
| 151 |
|
| 152 |
- *A* synchronizes with *B*, or
|
| 153 |
- *A* is dependency-ordered before *B*, or
|
|
@@ -156,12 +144,12 @@ An evaluation *A* *inter-thread happens before* an evaluation *B* if
|
|
| 156 |
- *A* is sequenced before *X* and *X* inter-thread happens before *B*,
|
| 157 |
or
|
| 158 |
- *A* inter-thread happens before *X* and *X* inter-thread happens
|
| 159 |
before *B*.
|
| 160 |
|
| 161 |
-
The “inter-thread happens before” relation describes
|
| 162 |
-
concatenations of “sequenced before”, “synchronizes with” and
|
| 163 |
“dependency-ordered before” relationships, with two exceptions. The
|
| 164 |
first exception is that a concatenation is not permitted to end with
|
| 165 |
“dependency-ordered before” followed by “sequenced before”. The reason
|
| 166 |
for this limitation is that a consume operation participating in a
|
| 167 |
“dependency-ordered before” relationship provides ordering only with
|
|
@@ -171,140 +159,316 @@ such a concatenation is that any subsequent release operation will
|
|
| 171 |
provide the required ordering for a prior consume operation. The second
|
| 172 |
exception is that a concatenation is not permitted to consist entirely
|
| 173 |
of “sequenced before”. The reasons for this limitation are (1) to permit
|
| 174 |
“inter-thread happens before” to be transitively closed and (2) the
|
| 175 |
“happens before” relation, defined below, provides for relationships
|
| 176 |
-
consisting entirely of “sequenced before”.
|
| 177 |
|
| 178 |
-
An evaluation *A* *happens before* an evaluation *B*
|
|
|
|
| 179 |
|
| 180 |
- *A* is sequenced before *B*, or
|
| 181 |
- *A* inter-thread happens before *B*.
|
| 182 |
|
| 183 |
The implementation shall ensure that no program execution demonstrates a
|
| 184 |
-
cycle in the “happens before” relation.
|
| 185 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
|
| 187 |
A *visible side effect* *A* on a scalar object or bit-field *M* with
|
| 188 |
respect to a value computation *B* of *M* satisfies the conditions:
|
| 189 |
|
| 190 |
- *A* happens before *B* and
|
| 191 |
- there is no other side effect *X* to *M* such that *A* happens before
|
| 192 |
*X* and *X* happens before *B*.
|
| 193 |
|
| 194 |
The value of a non-atomic scalar object or bit-field *M*, as determined
|
| 195 |
by evaluation *B*, shall be the value stored by the visible side effect
|
| 196 |
-
*A*.
|
| 197 |
-
|
| 198 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 199 |
visibly reordered. This is not actually detectable without data races,
|
| 200 |
but it is necessary to ensure that data races, as defined below, and
|
| 201 |
with suitable restrictions on the use of atomics, correspond to data
|
| 202 |
-
races in a simple interleaved (sequentially consistent)
|
|
|
|
| 203 |
|
| 204 |
The value of an atomic object *M*, as determined by evaluation *B*,
|
| 205 |
shall be the value stored by some side effect *A* that modifies *M*,
|
| 206 |
-
where *B* does not happen before *A*.
|
| 207 |
-
|
| 208 |
-
|
|
|
|
|
|
|
| 209 |
|
| 210 |
If an operation *A* that modifies an atomic object *M* happens before an
|
| 211 |
operation *B* that modifies *M*, then *A* shall be earlier than *B* in
|
| 212 |
-
the modification order of *M*.
|
| 213 |
-
|
|
|
|
|
|
|
| 214 |
|
| 215 |
If a value computation *A* of an atomic object *M* happens before a
|
| 216 |
value computation *B* of *M*, and *A* takes its value from a side effect
|
| 217 |
*X* on *M*, then the value computed by *B* shall either be the value
|
| 218 |
stored by *X* or the value stored by a side effect *Y* on *M*, where *Y*
|
| 219 |
-
follows *X* in the modification order of *M*.
|
| 220 |
-
|
|
|
|
|
|
|
| 221 |
|
| 222 |
If a value computation *A* of an atomic object *M* happens before an
|
| 223 |
operation *B* that modifies *M*, then *A* shall take its value from a
|
| 224 |
side effect *X* on *M*, where *X* precedes *B* in the modification order
|
| 225 |
-
of *M*.
|
|
|
|
|
|
|
|
|
|
| 226 |
|
| 227 |
If a side effect *X* on an atomic object *M* happens before a value
|
| 228 |
computation *B* of *M*, then the evaluation *B* shall take its value
|
| 229 |
from *X* or from a side effect *Y* that follows *X* in the modification
|
| 230 |
-
order of *M*.
|
| 231 |
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
operations are relaxed loads. This effectively makes the cache coherence
|
| 235 |
-
guarantee provided by most hardware available to C++atomic operations.
|
| 236 |
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 241 |
suitably chosen modification orders and the “happens before” relation
|
| 242 |
derived as described above, satisfy the resulting constraints as imposed
|
| 243 |
-
here.
|
| 244 |
|
| 245 |
Two actions are *potentially concurrent* if
|
| 246 |
|
| 247 |
- they are performed by different threads, or
|
| 248 |
-
- they are unsequenced,
|
| 249 |
-
handler.
|
| 250 |
|
| 251 |
The execution of a program contains a *data race* if it contains two
|
| 252 |
potentially concurrent conflicting actions, at least one of which is not
|
| 253 |
atomic, and neither happens before the other, except for the special
|
| 254 |
case for signal handlers described below. Any such data race results in
|
| 255 |
-
undefined behavior.
|
| 256 |
-
mutexes and `memory_order_seq_cst` operations to prevent all data races
|
| 257 |
-
and use no other synchronization operations behave as if the operations
|
| 258 |
-
executed by their constituent threads were simply interleaved, with each
|
| 259 |
-
value computation of an object being taken from the last side effect on
|
| 260 |
-
that object in that interleaving. This is normally referred to as
|
| 261 |
-
“sequential consistency”. However, this applies only to data-race-free
|
| 262 |
-
programs, and data-race-free programs cannot observe most program
|
| 263 |
-
transformations that do not change single-threaded program semantics. In
|
| 264 |
-
fact, most single-threaded program transformations continue to be
|
| 265 |
-
allowed, since any program that behaves differently as a result must
|
| 266 |
-
perform an undefined operation.
|
| 267 |
|
| 268 |
-
|
| 269 |
-
|
| 270 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 271 |
evaluations performed by the thread invoking a signal handler can be
|
| 272 |
divided into two groups *A* and *B*, such that no evaluations in *B*
|
| 273 |
happen before evaluations in *A*, and the evaluations of such
|
| 274 |
-
`volatile sig_atomic_t` objects take values as though all
|
| 275 |
-
*A* happened before the execution of the signal handler
|
| 276 |
-
execution of the signal handler happened before all evaluations
|
|
|
|
| 277 |
|
| 278 |
-
Compiler transformations that introduce assignments to a
|
| 279 |
-
shared memory location that would not be modified by the
|
| 280 |
-
machine are generally precluded by this
|
| 281 |
-
assignment might overwrite another assignment by a
|
| 282 |
-
cases in which an abstract machine execution would
|
| 283 |
-
a data race. This includes implementations of data
|
| 284 |
-
that overwrite adjacent members in separate memory
|
| 285 |
-
of atomic loads in cases in which the atomics in
|
| 286 |
-
also generally precluded, since this may violate
|
|
|
|
| 287 |
|
| 288 |
-
Transformations that introduce a speculative read of a
|
| 289 |
-
shared memory location may not preserve the semantics of the
|
| 290 |
-
as defined in this
|
| 291 |
-
race. However, they are typically valid in
|
| 292 |
-
compiler that targets a specific machine
|
| 293 |
-
data races. They would be invalid for a
|
| 294 |
-
tolerant of races or provides hardware
|
|
|
|
|
|
|
|
|
|
| 295 |
|
| 296 |
The implementation may assume that any thread will eventually do one of
|
| 297 |
the following:
|
| 298 |
|
| 299 |
- terminate,
|
| 300 |
- make a call to a library I/O function,
|
| 301 |
-
-
|
| 302 |
- perform a synchronization operation or an atomic operation.
|
| 303 |
|
| 304 |
-
This is intended to allow compiler transformations such as
|
| 305 |
-
empty loops, even when termination cannot be
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 306 |
|
| 307 |
An implementation should ensure that the last value (in modification
|
| 308 |
order) assigned by an atomic or synchronization operation will become
|
| 309 |
visible to all other threads in a finite period of time.
|
| 310 |
|
|
|
|
| 1 |
## Multi-threaded executions and data races <a id="intro.multithread">[[intro.multithread]]</a>
|
| 2 |
|
| 3 |
A *thread of execution* (also known as a *thread*) is a single flow of
|
| 4 |
control within a program, including the initial invocation of a specific
|
| 5 |
top-level function, and recursively including every function invocation
|
| 6 |
+
subsequently executed by the thread.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
|
| 8 |
+
[*Note 1*: When one thread creates another, the initial call to the
|
| 9 |
+
top-level function of the new thread is executed by the new thread, not
|
| 10 |
+
by the creating thread. — *end note*]
|
|
|
|
| 11 |
|
| 12 |
+
Every thread in a program can potentially access every object and
|
| 13 |
+
function in a program.[^11] Under a hosted implementation, a C++program
|
| 14 |
+
can have more than one thread running concurrently. The execution of
|
| 15 |
+
each thread proceeds as defined by the remainder of this International
|
| 16 |
+
Standard. The execution of the entire program consists of an execution
|
| 17 |
+
of all of its threads.
|
| 18 |
|
| 19 |
+
[*Note 2*: Usually the execution can be viewed as an interleaving of
|
| 20 |
+
all its threads. However, some kinds of atomic operations, for example,
|
| 21 |
+
allow executions inconsistent with a simple interleaving, as described
|
| 22 |
+
below. — *end note*]
|
| 23 |
|
| 24 |
+
Under a freestanding implementation, it is *implementation-defined*
|
| 25 |
+
whether a program can have more than one thread of execution.
|
| 26 |
+
|
| 27 |
+
For a signal handler that is not executed as a result of a call to the
|
| 28 |
+
`std::raise` function, it is unspecified which thread of execution
|
| 29 |
+
contains the signal handler invocation.
|
| 30 |
+
|
| 31 |
+
### Data races <a id="intro.races">[[intro.races]]</a>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
The value of an object visible to a thread *T* at a particular point is
|
| 34 |
the initial value of the object, a value assigned to the object by *T*,
|
| 35 |
or a value assigned to the object by another thread, according to the
|
| 36 |
+
rules below.
|
| 37 |
+
|
| 38 |
+
[*Note 1*: In some cases, there may instead be undefined behavior. Much
|
| 39 |
+
of this section is motivated by the desire to support atomic operations
|
| 40 |
+
with explicit and detailed visibility constraints. However, it also
|
| 41 |
+
implicitly supports a simpler view for more restricted
|
| 42 |
+
programs. — *end note*]
|
| 43 |
|
| 44 |
Two expression evaluations *conflict* if one of them modifies a memory
|
| 45 |
+
location ([[intro.memory]]) and the other one reads or modifies the
|
| 46 |
same memory location.
|
| 47 |
|
| 48 |
The library defines a number of atomic operations (Clause [[atomics]])
|
| 49 |
and operations on mutexes (Clause [[thread]]) that are specially
|
| 50 |
identified as synchronization operations. These operations play a
|
|
|
|
| 54 |
acquire and release operation. A synchronization operation without an
|
| 55 |
associated memory location is a fence and can be either an acquire
|
| 56 |
fence, a release fence, or both an acquire and release fence. In
|
| 57 |
addition, there are relaxed atomic operations, which are not
|
| 58 |
synchronization operations, and atomic read-modify-write operations,
|
| 59 |
+
which have special characteristics.
|
| 60 |
+
|
| 61 |
+
[*Note 2*: For example, a call that acquires a mutex will perform an
|
| 62 |
+
acquire operation on the locations comprising the mutex.
|
| 63 |
+
Correspondingly, a call that releases the same mutex will perform a
|
| 64 |
+
release operation on those same locations. Informally, performing a
|
| 65 |
release operation on *A* forces prior side effects on other memory
|
| 66 |
locations to become visible to other threads that later perform a
|
| 67 |
consume or an acquire operation on *A*. “Relaxed” atomic operations are
|
| 68 |
not synchronization operations even though, like synchronization
|
| 69 |
+
operations, they cannot contribute to data races. — *end note*]
|
| 70 |
|
| 71 |
All modifications to a particular atomic object *M* occur in some
|
| 72 |
+
particular total order, called the *modification order* of *M*.
|
| 73 |
+
|
| 74 |
+
[*Note 3*: There is a separate order for each atomic object. There is
|
| 75 |
+
no requirement that these can be combined into a single total order for
|
| 76 |
+
all objects. In general this will be impossible since different threads
|
| 77 |
+
may observe modifications to different objects in inconsistent
|
| 78 |
+
orders. — *end note*]
|
|
|
|
|
|
|
| 79 |
|
| 80 |
A *release sequence* headed by a release operation *A* on an atomic
|
| 81 |
object *M* is a maximal contiguous sub-sequence of side effects in the
|
| 82 |
modification order of *M*, where the first operation is `A`, and every
|
| 83 |
subsequent operation
|
|
|
|
| 86 |
- is an atomic read-modify-write operation.
|
| 87 |
|
| 88 |
Certain library calls *synchronize with* other library calls performed
|
| 89 |
by another thread. For example, an atomic store-release synchronizes
|
| 90 |
with a load-acquire that takes its value from the store (
|
| 91 |
+
[[atomics.order]]).
|
| 92 |
+
|
| 93 |
+
[*Note 4*: Except in the specified cases, reading a later value does
|
| 94 |
+
not necessarily ensure visibility as described below. Such a requirement
|
| 95 |
+
would sometimes interfere with efficient implementation. — *end note*]
|
| 96 |
+
|
| 97 |
+
[*Note 5*: The specifications of the synchronization operations define
|
| 98 |
+
when one reads the value written by another. For atomic objects, the
|
| 99 |
+
definition is clear. All operations on a given mutex occur in a single
|
| 100 |
+
total order. Each mutex acquisition “reads the value written” by the
|
| 101 |
+
last mutex release. — *end note*]
|
| 102 |
|
| 103 |
An evaluation *A* *carries a dependency* to an evaluation *B* if
|
| 104 |
|
| 105 |
- the value of *A* is used as an operand of *B*, unless:
|
| 106 |
- *B* is an invocation of any specialization of
|
|
|
|
| 117 |
- *A* writes a scalar object or bit-field *M*, *B* reads the value
|
| 118 |
written by *A* from *M*, and *A* is sequenced before *B*, or
|
| 119 |
- for some evaluation *X*, *A* carries a dependency to *X*, and *X*
|
| 120 |
carries a dependency to *B*.
|
| 121 |
|
| 122 |
+
[*Note 6*: “Carries a dependency to” is a subset of “is sequenced
|
| 123 |
+
before”, and is similarly strictly intra-thread. — *end note*]
|
| 124 |
|
| 125 |
An evaluation *A* is *dependency-ordered before* an evaluation *B* if
|
| 126 |
|
| 127 |
- *A* performs a release operation on an atomic object *M*, and, in
|
| 128 |
another thread, *B* performs a consume operation on *M* and reads a
|
| 129 |
value written by any side effect in the release sequence headed by
|
| 130 |
*A*, or
|
| 131 |
- for some evaluation *X*, *A* is dependency-ordered before *X* and *X*
|
| 132 |
carries a dependency to *B*.
|
| 133 |
|
| 134 |
+
[*Note 7*: The relation “is dependency-ordered before” is analogous to
|
| 135 |
“synchronizes with”, but uses release/consume in place of
|
| 136 |
+
release/acquire. — *end note*]
|
| 137 |
|
| 138 |
An evaluation *A* *inter-thread happens before* an evaluation *B* if
|
| 139 |
|
| 140 |
- *A* synchronizes with *B*, or
|
| 141 |
- *A* is dependency-ordered before *B*, or
|
|
|
|
| 144 |
- *A* is sequenced before *X* and *X* inter-thread happens before *B*,
|
| 145 |
or
|
| 146 |
- *A* inter-thread happens before *X* and *X* inter-thread happens
|
| 147 |
before *B*.
|
| 148 |
|
| 149 |
+
[*Note 8*: The “inter-thread happens before” relation describes
|
| 150 |
+
arbitrary concatenations of “sequenced before”, “synchronizes with” and
|
| 151 |
“dependency-ordered before” relationships, with two exceptions. The
|
| 152 |
first exception is that a concatenation is not permitted to end with
|
| 153 |
“dependency-ordered before” followed by “sequenced before”. The reason
|
| 154 |
for this limitation is that a consume operation participating in a
|
| 155 |
“dependency-ordered before” relationship provides ordering only with
|
|
|
|
| 159 |
provide the required ordering for a prior consume operation. The second
|
| 160 |
exception is that a concatenation is not permitted to consist entirely
|
| 161 |
of “sequenced before”. The reasons for this limitation are (1) to permit
|
| 162 |
“inter-thread happens before” to be transitively closed and (2) the
|
| 163 |
“happens before” relation, defined below, provides for relationships
|
| 164 |
+
consisting entirely of “sequenced before”. — *end note*]
|
| 165 |
|
| 166 |
+
An evaluation *A* *happens before* an evaluation *B* (or, equivalently,
|
| 167 |
+
*B* *happens after* *A*) if:
|
| 168 |
|
| 169 |
- *A* is sequenced before *B*, or
|
| 170 |
- *A* inter-thread happens before *B*.
|
| 171 |
|
| 172 |
The implementation shall ensure that no program execution demonstrates a
|
| 173 |
+
cycle in the “happens before” relation.
|
| 174 |
+
|
| 175 |
+
[*Note 9*: This cycle would otherwise be possible only through the use
|
| 176 |
+
of consume operations. — *end note*]
|
| 177 |
+
|
| 178 |
+
An evaluation *A* *strongly happens before* an evaluation *B* if either
|
| 179 |
+
|
| 180 |
+
- *A* is sequenced before *B*, or
|
| 181 |
+
- *A* synchronizes with *B*, or
|
| 182 |
+
- *A* strongly happens before *X* and *X* strongly happens before *B*.
|
| 183 |
+
|
| 184 |
+
[*Note 10*: In the absence of consume operations, the happens before
|
| 185 |
+
and strongly happens before relations are identical. Strongly happens
|
| 186 |
+
before essentially excludes consume operations. — *end note*]
|
| 187 |
|
| 188 |
A *visible side effect* *A* on a scalar object or bit-field *M* with
|
| 189 |
respect to a value computation *B* of *M* satisfies the conditions:
|
| 190 |
|
| 191 |
- *A* happens before *B* and
|
| 192 |
- there is no other side effect *X* to *M* such that *A* happens before
|
| 193 |
*X* and *X* happens before *B*.
|
| 194 |
|
| 195 |
The value of a non-atomic scalar object or bit-field *M*, as determined
|
| 196 |
by evaluation *B*, shall be the value stored by the visible side effect
|
| 197 |
+
*A*.
|
| 198 |
+
|
| 199 |
+
[*Note 11*: If there is ambiguity about which side effect to a
|
| 200 |
+
non-atomic object or bit-field is visible, then the behavior is either
|
| 201 |
+
unspecified or undefined. — *end note*]
|
| 202 |
+
|
| 203 |
+
[*Note 12*: This states that operations on ordinary objects are not
|
| 204 |
visibly reordered. This is not actually detectable without data races,
|
| 205 |
but it is necessary to ensure that data races, as defined below, and
|
| 206 |
with suitable restrictions on the use of atomics, correspond to data
|
| 207 |
+
races in a simple interleaved (sequentially consistent)
|
| 208 |
+
execution. — *end note*]
|
| 209 |
|
| 210 |
The value of an atomic object *M*, as determined by evaluation *B*,
|
| 211 |
shall be the value stored by some side effect *A* that modifies *M*,
|
| 212 |
+
where *B* does not happen before *A*.
|
| 213 |
+
|
| 214 |
+
[*Note 13*: The set of such side effects is also restricted by the rest
|
| 215 |
+
of the rules described here, and in particular, by the coherence
|
| 216 |
+
requirements below. — *end note*]
|
| 217 |
|
| 218 |
If an operation *A* that modifies an atomic object *M* happens before an
|
| 219 |
operation *B* that modifies *M*, then *A* shall be earlier than *B* in
|
| 220 |
+
the modification order of *M*.
|
| 221 |
+
|
| 222 |
+
[*Note 14*: This requirement is known as write-write
|
| 223 |
+
coherence. — *end note*]
|
| 224 |
|
| 225 |
If a value computation *A* of an atomic object *M* happens before a
|
| 226 |
value computation *B* of *M*, and *A* takes its value from a side effect
|
| 227 |
*X* on *M*, then the value computed by *B* shall either be the value
|
| 228 |
stored by *X* or the value stored by a side effect *Y* on *M*, where *Y*
|
| 229 |
+
follows *X* in the modification order of *M*.
|
| 230 |
+
|
| 231 |
+
[*Note 15*: This requirement is known as read-read
|
| 232 |
+
coherence. — *end note*]
|
| 233 |
|
| 234 |
If a value computation *A* of an atomic object *M* happens before an
|
| 235 |
operation *B* that modifies *M*, then *A* shall take its value from a
|
| 236 |
side effect *X* on *M*, where *X* precedes *B* in the modification order
|
| 237 |
+
of *M*.
|
| 238 |
+
|
| 239 |
+
[*Note 16*: This requirement is known as read-write
|
| 240 |
+
coherence. — *end note*]
|
| 241 |
|
| 242 |
If a side effect *X* on an atomic object *M* happens before a value
|
| 243 |
computation *B* of *M*, then the evaluation *B* shall take its value
|
| 244 |
from *X* or from a side effect *Y* that follows *X* in the modification
|
| 245 |
+
order of *M*.
|
| 246 |
|
| 247 |
+
[*Note 17*: This requirement is known as write-read
|
| 248 |
+
coherence. — *end note*]
|
|
|
|
|
|
|
| 249 |
|
| 250 |
+
[*Note 18*: The four preceding coherence requirements effectively
|
| 251 |
+
disallow compiler reordering of atomic operations to a single object,
|
| 252 |
+
even if both operations are relaxed loads. This effectively makes the
|
| 253 |
+
cache coherence guarantee provided by most hardware available to
|
| 254 |
+
C++atomic operations. — *end note*]
|
| 255 |
+
|
| 256 |
+
[*Note 19*: The value observed by a load of an atomic depends on the
|
| 257 |
+
“happens before” relation, which depends on the values observed by loads
|
| 258 |
+
of atomics. The intended reading is that there must exist an association
|
| 259 |
+
of atomic loads with modifications they observe that, together with
|
| 260 |
suitably chosen modification orders and the “happens before” relation
|
| 261 |
derived as described above, satisfy the resulting constraints as imposed
|
| 262 |
+
here. — *end note*]
|
| 263 |
|
| 264 |
Two actions are *potentially concurrent* if
|
| 265 |
|
| 266 |
- they are performed by different threads, or
|
| 267 |
+
- they are unsequenced, at least one is performed by a signal handler,
|
| 268 |
+
and they are not both performed by the same signal handler invocation.
|
| 269 |
|
| 270 |
The execution of a program contains a *data race* if it contains two
|
| 271 |
potentially concurrent conflicting actions, at least one of which is not
|
| 272 |
atomic, and neither happens before the other, except for the special
|
| 273 |
case for signal handlers described below. Any such data race results in
|
| 274 |
+
undefined behavior.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 275 |
|
| 276 |
+
[*Note 20*: It can be shown that programs that correctly use mutexes
|
| 277 |
+
and `memory_order_seq_cst` operations to prevent all data races and use
|
| 278 |
+
no other synchronization operations behave as if the operations executed
|
| 279 |
+
by their constituent threads were simply interleaved, with each value
|
| 280 |
+
computation of an object being taken from the last side effect on that
|
| 281 |
+
object in that interleaving. This is normally referred to as “sequential
|
| 282 |
+
consistency”. However, this applies only to data-race-free programs, and
|
| 283 |
+
data-race-free programs cannot observe most program transformations that
|
| 284 |
+
do not change single-threaded program semantics. In fact, most
|
| 285 |
+
single-threaded program transformations continue to be allowed, since
|
| 286 |
+
any program that behaves differently as a result must perform an
|
| 287 |
+
undefined operation. — *end note*]
|
| 288 |
+
|
| 289 |
+
Two accesses to the same object of type `volatile std::sig_atomic_t` do
|
| 290 |
+
not result in a data race if both occur in the same thread, even if one
|
| 291 |
+
or more occurs in a signal handler. For each signal handler invocation,
|
| 292 |
evaluations performed by the thread invoking a signal handler can be
|
| 293 |
divided into two groups *A* and *B*, such that no evaluations in *B*
|
| 294 |
happen before evaluations in *A*, and the evaluations of such
|
| 295 |
+
`volatile std::sig_atomic_t` objects take values as though all
|
| 296 |
+
evaluations in *A* happened before the execution of the signal handler
|
| 297 |
+
and the execution of the signal handler happened before all evaluations
|
| 298 |
+
in *B*.
|
| 299 |
|
| 300 |
+
[*Note 21*: Compiler transformations that introduce assignments to a
|
| 301 |
+
potentially shared memory location that would not be modified by the
|
| 302 |
+
abstract machine are generally precluded by this International Standard,
|
| 303 |
+
since such an assignment might overwrite another assignment by a
|
| 304 |
+
different thread in cases in which an abstract machine execution would
|
| 305 |
+
not have encountered a data race. This includes implementations of data
|
| 306 |
+
member assignment that overwrite adjacent members in separate memory
|
| 307 |
+
locations. Reordering of atomic loads in cases in which the atomics in
|
| 308 |
+
question may alias is also generally precluded, since this may violate
|
| 309 |
+
the coherence rules. — *end note*]
|
| 310 |
|
| 311 |
+
[*Note 22*: Transformations that introduce a speculative read of a
|
| 312 |
+
potentially shared memory location may not preserve the semantics of the
|
| 313 |
+
C++program as defined in this International Standard, since they
|
| 314 |
+
potentially introduce a data race. However, they are typically valid in
|
| 315 |
+
the context of an optimizing compiler that targets a specific machine
|
| 316 |
+
with well-defined semantics for data races. They would be invalid for a
|
| 317 |
+
hypothetical machine that is not tolerant of races or provides hardware
|
| 318 |
+
race detection. — *end note*]
|
| 319 |
+
|
| 320 |
+
### Forward progress <a id="intro.progress">[[intro.progress]]</a>
|
| 321 |
|
| 322 |
The implementation may assume that any thread will eventually do one of
|
| 323 |
the following:
|
| 324 |
|
| 325 |
- terminate,
|
| 326 |
- make a call to a library I/O function,
|
| 327 |
+
- perform an access through a volatile glvalue, or
|
| 328 |
- perform a synchronization operation or an atomic operation.
|
| 329 |
|
| 330 |
+
[*Note 1*: This is intended to allow compiler transformations such as
|
| 331 |
+
removal of empty loops, even when termination cannot be
|
| 332 |
+
proven. — *end note*]
|
| 333 |
+
|
| 334 |
+
Executions of atomic functions that are either defined to be lock-free (
|
| 335 |
+
[[atomics.flag]]) or indicated as lock-free ([[atomics.lockfree]]) are
|
| 336 |
+
*lock-free executions*.
|
| 337 |
+
|
| 338 |
+
- If there is only one thread that is not blocked ([[defns.block]]) in
|
| 339 |
+
a standard library function, a lock-free execution in that thread
|
| 340 |
+
shall complete. \[*Note 2*: Concurrently executing threads may prevent
|
| 341 |
+
progress of a lock-free execution. For example, this situation can
|
| 342 |
+
occur with load-locked store-conditional implementations. This
|
| 343 |
+
property is sometimes termed obstruction-free. — *end note*]
|
| 344 |
+
- When one or more lock-free executions run concurrently, at least one
|
| 345 |
+
should complete. \[*Note 3*: It is difficult for some implementations
|
| 346 |
+
to provide absolute guarantees to this effect, since repeated and
|
| 347 |
+
particularly inopportune interference from other threads may prevent
|
| 348 |
+
forward progress, e.g., by repeatedly stealing a cache line for
|
| 349 |
+
unrelated purposes between load-locked and store-conditional
|
| 350 |
+
instructions. Implementations should ensure that such effects cannot
|
| 351 |
+
indefinitely delay progress under expected operating conditions, and
|
| 352 |
+
that such anomalies can therefore safely be ignored by programmers.
|
| 353 |
+
Outside this document, this property is sometimes termed
|
| 354 |
+
lock-free. — *end note*]
|
| 355 |
+
|
| 356 |
+
During the execution of a thread of execution, each of the following is
|
| 357 |
+
termed an *execution step*:
|
| 358 |
+
|
| 359 |
+
- termination of the thread of execution,
|
| 360 |
+
- performing an access through a volatile glvalue, or
|
| 361 |
+
- completion of a call to a library I/O function, a synchronization
|
| 362 |
+
operation, or an atomic operation.
|
| 363 |
+
|
| 364 |
+
An invocation of a standard library function that blocks (
|
| 365 |
+
[[defns.block]]) is considered to continuously execute execution steps
|
| 366 |
+
while waiting for the condition that it blocks on to be satisfied.
|
| 367 |
+
|
| 368 |
+
[*Example 1*: A library I/O function that blocks until the I/O
|
| 369 |
+
operation is complete can be considered to continuously check whether
|
| 370 |
+
the operation is complete. Each such check might consist of one or more
|
| 371 |
+
execution steps, for example using observable behavior of the abstract
|
| 372 |
+
machine. — *end example*]
|
| 373 |
+
|
| 374 |
+
[*Note 4*: Because of this and the preceding requirement regarding what
|
| 375 |
+
threads of execution have to perform eventually, it follows that no
|
| 376 |
+
thread of execution can execute forever without an execution step
|
| 377 |
+
occurring. — *end note*]
|
| 378 |
+
|
| 379 |
+
A thread of execution *makes progress* when an execution step occurs or
|
| 380 |
+
a lock-free execution does not complete because there are other
|
| 381 |
+
concurrent threads that are not blocked in a standard library function
|
| 382 |
+
(see above).
|
| 383 |
+
|
| 384 |
+
For a thread of execution providing *concurrent forward progress
|
| 385 |
+
guarantees*, the implementation ensures that the thread will eventually
|
| 386 |
+
make progress for as long as it has not terminated.
|
| 387 |
+
|
| 388 |
+
[*Note 5*: This is required regardless of whether or not other threads
|
| 389 |
+
of executions (if any) have been or are making progress. To eventually
|
| 390 |
+
fulfill this requirement means that this will happen in an unspecified
|
| 391 |
+
but finite amount of time. — *end note*]
|
| 392 |
+
|
| 393 |
+
It is *implementation-defined* whether the implementation-created thread
|
| 394 |
+
of execution that executes `main` ([[basic.start.main]]) and the
|
| 395 |
+
threads of execution created by `std::thread` ([[thread.thread.class]])
|
| 396 |
+
provide concurrent forward progress guarantees.
|
| 397 |
+
|
| 398 |
+
[*Note 6*: General-purpose implementations are encouraged to provide
|
| 399 |
+
these guarantees. — *end note*]
|
| 400 |
+
|
| 401 |
+
For a thread of execution providing *parallel forward progress
|
| 402 |
+
guarantees*, the implementation is not required to ensure that the
|
| 403 |
+
thread will eventually make progress if it has not yet executed any
|
| 404 |
+
execution step; once this thread has executed a step, it provides
|
| 405 |
+
concurrent forward progress guarantees.
|
| 406 |
+
|
| 407 |
+
[*Note 7*: This does not specify a requirement for when to start this
|
| 408 |
+
thread of execution, which will typically be specified by the entity
|
| 409 |
+
that creates this thread of execution. For example, a thread of
|
| 410 |
+
execution that provides concurrent forward progress guarantees and
|
| 411 |
+
executes tasks from a set of tasks in an arbitrary order, one after the
|
| 412 |
+
other, satisfies the requirements of parallel forward progress for these
|
| 413 |
+
tasks. — *end note*]
|
| 414 |
+
|
| 415 |
+
For a thread of execution providing *weakly parallel forward progress
|
| 416 |
+
guarantees*, the implementation does not ensure that the thread will
|
| 417 |
+
eventually make progress.
|
| 418 |
+
|
| 419 |
+
[*Note 8*: Threads of execution providing weakly parallel forward
|
| 420 |
+
progress guarantees cannot be expected to make progress regardless of
|
| 421 |
+
whether other threads make progress or not; however, blocking with
|
| 422 |
+
forward progress guarantee delegation, as defined below, can be used to
|
| 423 |
+
ensure that such threads of execution make progress
|
| 424 |
+
eventually. — *end note*]
|
| 425 |
+
|
| 426 |
+
Concurrent forward progress guarantees are stronger than parallel
|
| 427 |
+
forward progress guarantees, which in turn are stronger than weakly
|
| 428 |
+
parallel forward progress guarantees.
|
| 429 |
+
|
| 430 |
+
[*Note 9*: For example, some kinds of synchronization between threads
|
| 431 |
+
of execution may only make progress if the respective threads of
|
| 432 |
+
execution provide parallel forward progress guarantees, but will fail to
|
| 433 |
+
make progress under weakly parallel guarantees. — *end note*]
|
| 434 |
+
|
| 435 |
+
When a thread of execution *P* is specified to *block with forward
|
| 436 |
+
progress guarantee delegation* on the completion of a set *S* of threads
|
| 437 |
+
of execution, then throughout the whole time of *P* being blocked on
|
| 438 |
+
*S*, the implementation shall ensure that the forward progress
|
| 439 |
+
guarantees provided by at least one thread of execution in *S* is at
|
| 440 |
+
least as strong as *P*’s forward progress guarantees.
|
| 441 |
+
|
| 442 |
+
[*Note 10*: It is unspecified which thread or threads of execution in
|
| 443 |
+
*S* are chosen and for which number of execution steps. The
|
| 444 |
+
strengthening is not permanent and not necessarily in place for the rest
|
| 445 |
+
of the lifetime of the affected thread of execution. As long as *P* is
|
| 446 |
+
blocked, the implementation has to eventually select and potentially
|
| 447 |
+
strengthen a thread of execution in *S*. — *end note*]
|
| 448 |
+
|
| 449 |
+
Once a thread of execution in *S* terminates, it is removed from *S*.
|
| 450 |
+
Once *S* is empty, *P* is unblocked.
|
| 451 |
+
|
| 452 |
+
[*Note 11*: A thread of execution *B* thus can temporarily provide an
|
| 453 |
+
effectively stronger forward progress guarantee for a certain amount of
|
| 454 |
+
time, due to a second thread of execution *A* being blocked on it with
|
| 455 |
+
forward progress guarantee delegation. In turn, if *B* then blocks with
|
| 456 |
+
forward progress guarantee delegation on *C*, this may also temporarily
|
| 457 |
+
provide a stronger forward progress guarantee to *C*. — *end note*]
|
| 458 |
+
|
| 459 |
+
[*Note 12*: If all threads of execution in *S* finish executing (e.g.,
|
| 460 |
+
they terminate and do not use blocking synchronization incorrectly),
|
| 461 |
+
then *P*’s execution of the operation that blocks with forward progress
|
| 462 |
+
guarantee delegation will not result in *P*’s progress guarantee being
|
| 463 |
+
effectively weakened. — *end note*]
|
| 464 |
+
|
| 465 |
+
[*Note 13*: This does not remove any constraints regarding blocking
|
| 466 |
+
synchronization for threads of execution providing parallel or weakly
|
| 467 |
+
parallel forward progress guarantees because the implementation is not
|
| 468 |
+
required to strengthen a particular thread of execution whose too-weak
|
| 469 |
+
progress guarantee is preventing overall progress. — *end note*]
|
| 470 |
|
| 471 |
An implementation should ensure that the last value (in modification
|
| 472 |
order) assigned by an atomic or synchronization operation will become
|
| 473 |
visible to all other threads in a finite period of time.
|
| 474 |
|