From Jason Turner

[intro.multithread]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpwj5n5_fd/{from.md → to.md} +302 -138
tmp/tmpwj5n5_fd/{from.md → to.md} RENAMED
@@ -1,65 +1,50 @@
1
  ## Multi-threaded executions and data races <a id="intro.multithread">[[intro.multithread]]</a>
2
 
3
  A *thread of execution* (also known as a *thread*) is a single flow of
4
  control within a program, including the initial invocation of a specific
5
  top-level function, and recursively including every function invocation
6
- subsequently executed by the thread. When one thread creates another,
7
- the initial call to the top-level function of the new thread is executed
8
- by the new thread, not by the creating thread. Every thread in a program
9
- can potentially access every object and function in a program.[^10]
10
- Under a hosted implementation, a C++program can have more than one
11
- thread running concurrently. The execution of each thread proceeds as
12
- defined by the remainder of this standard. The execution of the entire
13
- program consists of an execution of all of its threads. Usually the
14
- execution can be viewed as an interleaving of all its threads. However,
15
- some kinds of atomic operations, for example, allow executions
16
- inconsistent with a simple interleaving, as described below. Under a
17
- freestanding implementation, it is *implementation-defined* whether a
18
- program can have more than one thread of execution.
19
 
20
- A signal handler that is executed as a result of a call to the `raise`
21
- function belongs to the same thread of execution as the call to the
22
- `raise` function. Otherwise it is unspecified which thread of execution
23
- contains a signal handler invocation.
24
 
25
- Implementations should ensure that all unblocked threads eventually make
26
- progress. Standard library functions may silently block on I/O or locks.
27
- Factors in the execution environment, including externally-imposed
28
- thread priorities, may prevent an implementation from making certain
29
- guarantees of forward progress.
 
30
 
31
- Executions of atomic functions that are either defined to be lock-free (
32
- [[atomics.flag]]) or indicated as lock-free ([[atomics.lockfree]]) are
33
- *lock-free executions*.
 
34
 
35
- - If there is only one unblocked thread, a lock-free execution in that
36
- thread shall complete. Concurrently executing threads may prevent
37
- progress of a lock-free execution. For example, this situation can
38
- occur with load-locked store-conditional implementations. This
39
- property is sometimes termed obstruction-free.
40
- - When one or more lock-free executions run concurrently, at least one
41
- should complete. It is difficult for some implementations to provide
42
- absolute guarantees to this effect, since repeated and particularly
43
- inopportune interference from other threads may prevent forward
44
- progress, e.g., by repeatedly stealing a cache line for unrelated
45
- purposes between load-locked and store-conditional instructions.
46
- Implementations should ensure that such effects cannot indefinitely
47
- delay progress under expected operating conditions, and that such
48
- anomalies can therefore safely be ignored by programmers. Outside this
49
- International Standard, this property is sometimes termed lock-free.
50
 
51
  The value of an object visible to a thread *T* at a particular point is
52
  the initial value of the object, a value assigned to the object by *T*,
53
  or a value assigned to the object by another thread, according to the
54
- rules below. In some cases, there may instead be undefined behavior.
55
- Much of this section is motivated by the desire to support atomic
56
- operations with explicit and detailed visibility constraints. However,
57
- it also implicitly supports a simpler view for more restricted programs.
 
 
 
58
 
59
  Two expression evaluations *conflict* if one of them modifies a memory
60
- location ([[intro.memory]]) and the other one accesses or modifies the
61
  same memory location.
62
 
63
  The library defines a number of atomic operations (Clause  [[atomics]])
64
  and operations on mutexes (Clause  [[thread]]) that are specially
65
  identified as synchronization operations. These operations play a
@@ -69,30 +54,30 @@ consume operation, an acquire operation, a release operation, or both an
69
  acquire and release operation. A synchronization operation without an
70
  associated memory location is a fence and can be either an acquire
71
  fence, a release fence, or both an acquire and release fence. In
72
  addition, there are relaxed atomic operations, which are not
73
  synchronization operations, and atomic read-modify-write operations,
74
- which have special characteristics. For example, a call that acquires a
75
- mutex will perform an acquire operation on the locations comprising the
76
- mutex. Correspondingly, a call that releases the same mutex will perform
77
- a release operation on those same locations. Informally, performing a
 
 
78
  release operation on *A* forces prior side effects on other memory
79
  locations to become visible to other threads that later perform a
80
  consume or an acquire operation on *A*. “Relaxed” atomic operations are
81
  not synchronization operations even though, like synchronization
82
- operations, they cannot contribute to data races.
83
 
84
  All modifications to a particular atomic object *M* occur in some
85
- particular total order, called the *modification order* of *M*. If *A*
86
- and *B* are modifications of an atomic object *M* and *A* happens before
87
- (as defined below) *B*, then *A* shall precede *B* in the modification
88
- order of *M*, which is defined below. This states that the modification
89
- orders must respect the “happens before” relationship. There is a
90
- separate order for each atomic object. There is no requirement that
91
- these can be combined into a single total order for all objects. In
92
- general this will be impossible since different threads may observe
93
- modifications to different objects in inconsistent orders.
94
 
95
  A *release sequence* headed by a release operation *A* on an atomic
96
  object *M* is a maximal contiguous sub-sequence of side effects in the
97
  modification order of *M*, where the first operation is `A`, and every
98
  subsequent operation
@@ -101,18 +86,21 @@ subsequent operation
101
  - is an atomic read-modify-write operation.
102
 
103
  Certain library calls *synchronize with* other library calls performed
104
  by another thread. For example, an atomic store-release synchronizes
105
  with a load-acquire that takes its value from the store (
106
- [[atomics.order]]). Except in the specified cases, reading a later value
107
- does not necessarily ensure visibility as described below. Such a
108
- requirement would sometimes interfere with efficient implementation. The
109
- specifications of the synchronization operations define when one reads
110
- the value written by another. For atomic objects, the definition is
111
- clear. All operations on a given mutex occur in a single total order.
112
- Each mutex acquisition “reads the value written” by the last mutex
113
- release.
 
 
 
114
 
115
  An evaluation *A* *carries a dependency* to an evaluation *B* if
116
 
117
  - the value of *A* is used as an operand of *B*, unless:
118
  - *B* is an invocation of any specialization of
@@ -129,25 +117,25 @@ An evaluation *A* *carries a dependency* to an evaluation *B* if
129
  - *A* writes a scalar object or bit-field *M*, *B* reads the value
130
  written by *A* from *M*, and *A* is sequenced before *B*, or
131
  - for some evaluation *X*, *A* carries a dependency to *X*, and *X*
132
  carries a dependency to *B*.
133
 
134
- “Carries a dependency to” is a subset of “is sequenced before”, and is
135
- similarly strictly intra-thread.
136
 
137
  An evaluation *A* is *dependency-ordered before* an evaluation *B* if
138
 
139
  - *A* performs a release operation on an atomic object *M*, and, in
140
  another thread, *B* performs a consume operation on *M* and reads a
141
  value written by any side effect in the release sequence headed by
142
  *A*, or
143
  - for some evaluation *X*, *A* is dependency-ordered before *X* and *X*
144
  carries a dependency to *B*.
145
 
146
- The relation “is dependency-ordered before” is analogous to
147
  “synchronizes with”, but uses release/consume in place of
148
- release/acquire.
149
 
150
  An evaluation *A* *inter-thread happens before* an evaluation *B* if
151
 
152
  - *A* synchronizes with *B*, or
153
  - *A* is dependency-ordered before *B*, or
@@ -156,12 +144,12 @@ An evaluation *A* *inter-thread happens before* an evaluation *B* if
156
  - *A* is sequenced before *X* and *X* inter-thread happens before *B*,
157
  or
158
  - *A* inter-thread happens before *X* and *X* inter-thread happens
159
  before *B*.
160
 
161
- The “inter-thread happens before” relation describes arbitrary
162
- concatenations of “sequenced before”, “synchronizes with” and
163
  “dependency-ordered before” relationships, with two exceptions. The
164
  first exception is that a concatenation is not permitted to end with
165
  “dependency-ordered before” followed by “sequenced before”. The reason
166
  for this limitation is that a consume operation participating in a
167
  “dependency-ordered before” relationship provides ordering only with
@@ -171,140 +159,316 @@ such a concatenation is that any subsequent release operation will
171
  provide the required ordering for a prior consume operation. The second
172
  exception is that a concatenation is not permitted to consist entirely
173
  of “sequenced before”. The reasons for this limitation are (1) to permit
174
  “inter-thread happens before” to be transitively closed and (2) the
175
  “happens before” relation, defined below, provides for relationships
176
- consisting entirely of “sequenced before”.
177
 
178
- An evaluation *A* *happens before* an evaluation *B* if:
 
179
 
180
  - *A* is sequenced before *B*, or
181
  - *A* inter-thread happens before *B*.
182
 
183
  The implementation shall ensure that no program execution demonstrates a
184
- cycle in the “happens before” relation. This cycle would otherwise be
185
- possible only through the use of consume operations.
 
 
 
 
 
 
 
 
 
 
 
 
186
 
187
  A *visible side effect* *A* on a scalar object or bit-field *M* with
188
  respect to a value computation *B* of *M* satisfies the conditions:
189
 
190
  - *A* happens before *B* and
191
  - there is no other side effect *X* to *M* such that *A* happens before
192
  *X* and *X* happens before *B*.
193
 
194
  The value of a non-atomic scalar object or bit-field *M*, as determined
195
  by evaluation *B*, shall be the value stored by the visible side effect
196
- *A*. If there is ambiguity about which side effect to a non-atomic
197
- object or bit-field is visible, then the behavior is either unspecified
198
- or undefined. This states that operations on ordinary objects are not
 
 
 
 
199
  visibly reordered. This is not actually detectable without data races,
200
  but it is necessary to ensure that data races, as defined below, and
201
  with suitable restrictions on the use of atomics, correspond to data
202
- races in a simple interleaved (sequentially consistent) execution.
 
203
 
204
  The value of an atomic object *M*, as determined by evaluation *B*,
205
  shall be the value stored by some side effect *A* that modifies *M*,
206
- where *B* does not happen before *A*. The set of such side effects is
207
- also restricted by the rest of the rules described here, and in
208
- particular, by the coherence requirements below.
 
 
209
 
210
  If an operation *A* that modifies an atomic object *M* happens before an
211
  operation *B* that modifies *M*, then *A* shall be earlier than *B* in
212
- the modification order of *M*. This requirement is known as write-write
213
- coherence.
 
 
214
 
215
  If a value computation *A* of an atomic object *M* happens before a
216
  value computation *B* of *M*, and *A* takes its value from a side effect
217
  *X* on *M*, then the value computed by *B* shall either be the value
218
  stored by *X* or the value stored by a side effect *Y* on *M*, where *Y*
219
- follows *X* in the modification order of *M*. This requirement is known
220
- as read-read coherence.
 
 
221
 
222
  If a value computation *A* of an atomic object *M* happens before an
223
  operation *B* that modifies *M*, then *A* shall take its value from a
224
  side effect *X* on *M*, where *X* precedes *B* in the modification order
225
- of *M*. This requirement is known as read-write coherence.
 
 
 
226
 
227
  If a side effect *X* on an atomic object *M* happens before a value
228
  computation *B* of *M*, then the evaluation *B* shall take its value
229
  from *X* or from a side effect *Y* that follows *X* in the modification
230
- order of *M*. This requirement is known as write-read coherence.
231
 
232
- The four preceding coherence requirements effectively disallow compiler
233
- reordering of atomic operations to a single object, even if both
234
- operations are relaxed loads. This effectively makes the cache coherence
235
- guarantee provided by most hardware available to C++atomic operations.
236
 
237
- The value observed by a load of an atomic depends on the “happens
238
- before” relation, which depends on the values observed by loads of
239
- atomics. The intended reading is that there must exist an association of
240
- atomic loads with modifications they observe that, together with
 
 
 
 
 
 
241
  suitably chosen modification orders and the “happens before” relation
242
  derived as described above, satisfy the resulting constraints as imposed
243
- here.
244
 
245
  Two actions are *potentially concurrent* if
246
 
247
  - they are performed by different threads, or
248
- - they are unsequenced, and at least one is performed by a signal
249
- handler.
250
 
251
  The execution of a program contains a *data race* if it contains two
252
  potentially concurrent conflicting actions, at least one of which is not
253
  atomic, and neither happens before the other, except for the special
254
  case for signal handlers described below. Any such data race results in
255
- undefined behavior. It can be shown that programs that correctly use
256
- mutexes and `memory_order_seq_cst` operations to prevent all data races
257
- and use no other synchronization operations behave as if the operations
258
- executed by their constituent threads were simply interleaved, with each
259
- value computation of an object being taken from the last side effect on
260
- that object in that interleaving. This is normally referred to as
261
- “sequential consistency”. However, this applies only to data-race-free
262
- programs, and data-race-free programs cannot observe most program
263
- transformations that do not change single-threaded program semantics. In
264
- fact, most single-threaded program transformations continue to be
265
- allowed, since any program that behaves differently as a result must
266
- perform an undefined operation.
267
 
268
- Two accesses to the same object of type `volatile sig_atomic_t` do not
269
- result in a data race if both occur in the same thread, even if one or
270
- more occurs in a signal handler. For each signal handler invocation,
 
 
 
 
 
 
 
 
 
 
 
 
 
271
  evaluations performed by the thread invoking a signal handler can be
272
  divided into two groups *A* and *B*, such that no evaluations in *B*
273
  happen before evaluations in *A*, and the evaluations of such
274
- `volatile sig_atomic_t` objects take values as though all evaluations in
275
- *A* happened before the execution of the signal handler and the
276
- execution of the signal handler happened before all evaluations in *B*.
 
277
 
278
- Compiler transformations that introduce assignments to a potentially
279
- shared memory location that would not be modified by the abstract
280
- machine are generally precluded by this standard, since such an
281
- assignment might overwrite another assignment by a different thread in
282
- cases in which an abstract machine execution would not have encountered
283
- a data race. This includes implementations of data member assignment
284
- that overwrite adjacent members in separate memory locations. Reordering
285
- of atomic loads in cases in which the atomics in question may alias is
286
- also generally precluded, since this may violate the coherence rules.
 
287
 
288
- Transformations that introduce a speculative read of a potentially
289
- shared memory location may not preserve the semantics of the C++program
290
- as defined in this standard, since they potentially introduce a data
291
- race. However, they are typically valid in the context of an optimizing
292
- compiler that targets a specific machine with well-defined semantics for
293
- data races. They would be invalid for a hypothetical machine that is not
294
- tolerant of races or provides hardware race detection.
 
 
 
295
 
296
  The implementation may assume that any thread will eventually do one of
297
  the following:
298
 
299
  - terminate,
300
  - make a call to a library I/O function,
301
- - access or modify a volatile object, or
302
  - perform a synchronization operation or an atomic operation.
303
 
304
- This is intended to allow compiler transformations such as removal of
305
- empty loops, even when termination cannot be proven.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
306
 
307
  An implementation should ensure that the last value (in modification
308
  order) assigned by an atomic or synchronization operation will become
309
  visible to all other threads in a finite period of time.
310
 
 
1
  ## Multi-threaded executions and data races <a id="intro.multithread">[[intro.multithread]]</a>
2
 
3
  A *thread of execution* (also known as a *thread*) is a single flow of
4
  control within a program, including the initial invocation of a specific
5
  top-level function, and recursively including every function invocation
6
+ subsequently executed by the thread.
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
+ [*Note 1*: When one thread creates another, the initial call to the
9
+ top-level function of the new thread is executed by the new thread, not
10
+ by the creating thread. *end note*]
 
11
 
12
+ Every thread in a program can potentially access every object and
13
+ function in a program.[^11] Under a hosted implementation, a C++program
14
+ can have more than one thread running concurrently. The execution of
15
+ each thread proceeds as defined by the remainder of this International
16
+ Standard. The execution of the entire program consists of an execution
17
+ of all of its threads.
18
 
19
+ [*Note 2*: Usually the execution can be viewed as an interleaving of
20
+ all its threads. However, some kinds of atomic operations, for example,
21
+ allow executions inconsistent with a simple interleaving, as described
22
+ below. — *end note*]
23
 
24
+ Under a freestanding implementation, it is *implementation-defined*
25
+ whether a program can have more than one thread of execution.
26
+
27
+ For a signal handler that is not executed as a result of a call to the
28
+ `std::raise` function, it is unspecified which thread of execution
29
+ contains the signal handler invocation.
30
+
31
+ ### Data races <a id="intro.races">[[intro.races]]</a>
 
 
 
 
 
 
 
32
 
33
  The value of an object visible to a thread *T* at a particular point is
34
  the initial value of the object, a value assigned to the object by *T*,
35
  or a value assigned to the object by another thread, according to the
36
+ rules below.
37
+
38
+ [*Note 1*: In some cases, there may instead be undefined behavior. Much
39
+ of this section is motivated by the desire to support atomic operations
40
+ with explicit and detailed visibility constraints. However, it also
41
+ implicitly supports a simpler view for more restricted
42
+ programs. — *end note*]
43
 
44
  Two expression evaluations *conflict* if one of them modifies a memory
45
+ location ([[intro.memory]]) and the other one reads or modifies the
46
  same memory location.
47
 
48
  The library defines a number of atomic operations (Clause  [[atomics]])
49
  and operations on mutexes (Clause  [[thread]]) that are specially
50
  identified as synchronization operations. These operations play a
 
54
  acquire and release operation. A synchronization operation without an
55
  associated memory location is a fence and can be either an acquire
56
  fence, a release fence, or both an acquire and release fence. In
57
  addition, there are relaxed atomic operations, which are not
58
  synchronization operations, and atomic read-modify-write operations,
59
+ which have special characteristics.
60
+
61
+ [*Note 2*: For example, a call that acquires a mutex will perform an
62
+ acquire operation on the locations comprising the mutex.
63
+ Correspondingly, a call that releases the same mutex will perform a
64
+ release operation on those same locations. Informally, performing a
65
  release operation on *A* forces prior side effects on other memory
66
  locations to become visible to other threads that later perform a
67
  consume or an acquire operation on *A*. “Relaxed” atomic operations are
68
  not synchronization operations even though, like synchronization
69
+ operations, they cannot contribute to data races. — *end note*]
70
 
71
  All modifications to a particular atomic object *M* occur in some
72
+ particular total order, called the *modification order* of *M*.
73
+
74
+ [*Note 3*: There is a separate order for each atomic object. There is
75
+ no requirement that these can be combined into a single total order for
76
+ all objects. In general this will be impossible since different threads
77
+ may observe modifications to different objects in inconsistent
78
+ orders. *end note*]
 
 
79
 
80
  A *release sequence* headed by a release operation *A* on an atomic
81
  object *M* is a maximal contiguous sub-sequence of side effects in the
82
  modification order of *M*, where the first operation is `A`, and every
83
  subsequent operation
 
86
  - is an atomic read-modify-write operation.
87
 
88
  Certain library calls *synchronize with* other library calls performed
89
  by another thread. For example, an atomic store-release synchronizes
90
  with a load-acquire that takes its value from the store (
91
+ [[atomics.order]]).
92
+
93
+ [*Note 4*: Except in the specified cases, reading a later value does
94
+ not necessarily ensure visibility as described below. Such a requirement
95
+ would sometimes interfere with efficient implementation. *end note*]
96
+
97
+ [*Note 5*: The specifications of the synchronization operations define
98
+ when one reads the value written by another. For atomic objects, the
99
+ definition is clear. All operations on a given mutex occur in a single
100
+ total order. Each mutex acquisition “reads the value written” by the
101
+ last mutex release. — *end note*]
102
 
103
  An evaluation *A* *carries a dependency* to an evaluation *B* if
104
 
105
  - the value of *A* is used as an operand of *B*, unless:
106
  - *B* is an invocation of any specialization of
 
117
  - *A* writes a scalar object or bit-field *M*, *B* reads the value
118
  written by *A* from *M*, and *A* is sequenced before *B*, or
119
  - for some evaluation *X*, *A* carries a dependency to *X*, and *X*
120
  carries a dependency to *B*.
121
 
122
+ [*Note 6*: “Carries a dependency to” is a subset of “is sequenced
123
+ before”, and is similarly strictly intra-thread. — *end note*]
124
 
125
  An evaluation *A* is *dependency-ordered before* an evaluation *B* if
126
 
127
  - *A* performs a release operation on an atomic object *M*, and, in
128
  another thread, *B* performs a consume operation on *M* and reads a
129
  value written by any side effect in the release sequence headed by
130
  *A*, or
131
  - for some evaluation *X*, *A* is dependency-ordered before *X* and *X*
132
  carries a dependency to *B*.
133
 
134
+ [*Note 7*: The relation “is dependency-ordered before” is analogous to
135
  “synchronizes with”, but uses release/consume in place of
136
+ release/acquire. — *end note*]
137
 
138
  An evaluation *A* *inter-thread happens before* an evaluation *B* if
139
 
140
  - *A* synchronizes with *B*, or
141
  - *A* is dependency-ordered before *B*, or
 
144
  - *A* is sequenced before *X* and *X* inter-thread happens before *B*,
145
  or
146
  - *A* inter-thread happens before *X* and *X* inter-thread happens
147
  before *B*.
148
 
149
+ [*Note 8*: The “inter-thread happens before” relation describes
150
+ arbitrary concatenations of “sequenced before”, “synchronizes with” and
151
  “dependency-ordered before” relationships, with two exceptions. The
152
  first exception is that a concatenation is not permitted to end with
153
  “dependency-ordered before” followed by “sequenced before”. The reason
154
  for this limitation is that a consume operation participating in a
155
  “dependency-ordered before” relationship provides ordering only with
 
159
  provide the required ordering for a prior consume operation. The second
160
  exception is that a concatenation is not permitted to consist entirely
161
  of “sequenced before”. The reasons for this limitation are (1) to permit
162
  “inter-thread happens before” to be transitively closed and (2) the
163
  “happens before” relation, defined below, provides for relationships
164
+ consisting entirely of “sequenced before”. — *end note*]
165
 
166
+ An evaluation *A* *happens before* an evaluation *B* (or, equivalently,
167
+ *B* *happens after* *A*) if:
168
 
169
  - *A* is sequenced before *B*, or
170
  - *A* inter-thread happens before *B*.
171
 
172
  The implementation shall ensure that no program execution demonstrates a
173
+ cycle in the “happens before” relation.
174
+
175
+ [*Note 9*: This cycle would otherwise be possible only through the use
176
+ of consume operations. — *end note*]
177
+
178
+ An evaluation *A* *strongly happens before* an evaluation *B* if either
179
+
180
+ - *A* is sequenced before *B*, or
181
+ - *A* synchronizes with *B*, or
182
+ - *A* strongly happens before *X* and *X* strongly happens before *B*.
183
+
184
+ [*Note 10*: In the absence of consume operations, the happens before
185
+ and strongly happens before relations are identical. Strongly happens
186
+ before essentially excludes consume operations. — *end note*]
187
 
188
  A *visible side effect* *A* on a scalar object or bit-field *M* with
189
  respect to a value computation *B* of *M* satisfies the conditions:
190
 
191
  - *A* happens before *B* and
192
  - there is no other side effect *X* to *M* such that *A* happens before
193
  *X* and *X* happens before *B*.
194
 
195
  The value of a non-atomic scalar object or bit-field *M*, as determined
196
  by evaluation *B*, shall be the value stored by the visible side effect
197
+ *A*.
198
+
199
+ [*Note 11*: If there is ambiguity about which side effect to a
200
+ non-atomic object or bit-field is visible, then the behavior is either
201
+ unspecified or undefined. — *end note*]
202
+
203
+ [*Note 12*: This states that operations on ordinary objects are not
204
  visibly reordered. This is not actually detectable without data races,
205
  but it is necessary to ensure that data races, as defined below, and
206
  with suitable restrictions on the use of atomics, correspond to data
207
+ races in a simple interleaved (sequentially consistent)
208
+ execution. — *end note*]
209
 
210
  The value of an atomic object *M*, as determined by evaluation *B*,
211
  shall be the value stored by some side effect *A* that modifies *M*,
212
+ where *B* does not happen before *A*.
213
+
214
+ [*Note 13*: The set of such side effects is also restricted by the rest
215
+ of the rules described here, and in particular, by the coherence
216
+ requirements below. — *end note*]
217
 
218
  If an operation *A* that modifies an atomic object *M* happens before an
219
  operation *B* that modifies *M*, then *A* shall be earlier than *B* in
220
+ the modification order of *M*.
221
+
222
+ [*Note 14*: This requirement is known as write-write
223
+ coherence. — *end note*]
224
 
225
  If a value computation *A* of an atomic object *M* happens before a
226
  value computation *B* of *M*, and *A* takes its value from a side effect
227
  *X* on *M*, then the value computed by *B* shall either be the value
228
  stored by *X* or the value stored by a side effect *Y* on *M*, where *Y*
229
+ follows *X* in the modification order of *M*.
230
+
231
+ [*Note 15*: This requirement is known as read-read
232
+ coherence. — *end note*]
233
 
234
  If a value computation *A* of an atomic object *M* happens before an
235
  operation *B* that modifies *M*, then *A* shall take its value from a
236
  side effect *X* on *M*, where *X* precedes *B* in the modification order
237
+ of *M*.
238
+
239
+ [*Note 16*: This requirement is known as read-write
240
+ coherence. — *end note*]
241
 
242
  If a side effect *X* on an atomic object *M* happens before a value
243
  computation *B* of *M*, then the evaluation *B* shall take its value
244
  from *X* or from a side effect *Y* that follows *X* in the modification
245
+ order of *M*.
246
 
247
+ [*Note 17*: This requirement is known as write-read
248
+ coherence. *end note*]
 
 
249
 
250
+ [*Note 18*: The four preceding coherence requirements effectively
251
+ disallow compiler reordering of atomic operations to a single object,
252
+ even if both operations are relaxed loads. This effectively makes the
253
+ cache coherence guarantee provided by most hardware available to
254
+ C++atomic operations. — *end note*]
255
+
256
+ [*Note 19*: The value observed by a load of an atomic depends on the
257
+ “happens before” relation, which depends on the values observed by loads
258
+ of atomics. The intended reading is that there must exist an association
259
+ of atomic loads with modifications they observe that, together with
260
  suitably chosen modification orders and the “happens before” relation
261
  derived as described above, satisfy the resulting constraints as imposed
262
+ here. — *end note*]
263
 
264
  Two actions are *potentially concurrent* if
265
 
266
  - they are performed by different threads, or
267
+ - they are unsequenced, at least one is performed by a signal handler,
268
+ and they are not both performed by the same signal handler invocation.
269
 
270
  The execution of a program contains a *data race* if it contains two
271
  potentially concurrent conflicting actions, at least one of which is not
272
  atomic, and neither happens before the other, except for the special
273
  case for signal handlers described below. Any such data race results in
274
+ undefined behavior.
 
 
 
 
 
 
 
 
 
 
 
275
 
276
+ [*Note 20*: It can be shown that programs that correctly use mutexes
277
+ and `memory_order_seq_cst` operations to prevent all data races and use
278
+ no other synchronization operations behave as if the operations executed
279
+ by their constituent threads were simply interleaved, with each value
280
+ computation of an object being taken from the last side effect on that
281
+ object in that interleaving. This is normally referred to as “sequential
282
+ consistency”. However, this applies only to data-race-free programs, and
283
+ data-race-free programs cannot observe most program transformations that
284
+ do not change single-threaded program semantics. In fact, most
285
+ single-threaded program transformations continue to be allowed, since
286
+ any program that behaves differently as a result must perform an
287
+ undefined operation. — *end note*]
288
+
289
+ Two accesses to the same object of type `volatile std::sig_atomic_t` do
290
+ not result in a data race if both occur in the same thread, even if one
291
+ or more occurs in a signal handler. For each signal handler invocation,
292
  evaluations performed by the thread invoking a signal handler can be
293
  divided into two groups *A* and *B*, such that no evaluations in *B*
294
  happen before evaluations in *A*, and the evaluations of such
295
+ `volatile std::sig_atomic_t` objects take values as though all
296
+ evaluations in *A* happened before the execution of the signal handler
297
+ and the execution of the signal handler happened before all evaluations
298
+ in *B*.
299
 
300
+ [*Note 21*: Compiler transformations that introduce assignments to a
301
+ potentially shared memory location that would not be modified by the
302
+ abstract machine are generally precluded by this International Standard,
303
+ since such an assignment might overwrite another assignment by a
304
+ different thread in cases in which an abstract machine execution would
305
+ not have encountered a data race. This includes implementations of data
306
+ member assignment that overwrite adjacent members in separate memory
307
+ locations. Reordering of atomic loads in cases in which the atomics in
308
+ question may alias is also generally precluded, since this may violate
309
+ the coherence rules. — *end note*]
310
 
311
+ [*Note 22*: Transformations that introduce a speculative read of a
312
+ potentially shared memory location may not preserve the semantics of the
313
+ C++program as defined in this International Standard, since they
314
+ potentially introduce a data race. However, they are typically valid in
315
+ the context of an optimizing compiler that targets a specific machine
316
+ with well-defined semantics for data races. They would be invalid for a
317
+ hypothetical machine that is not tolerant of races or provides hardware
318
+ race detection. — *end note*]
319
+
320
+ ### Forward progress <a id="intro.progress">[[intro.progress]]</a>
321
 
322
  The implementation may assume that any thread will eventually do one of
323
  the following:
324
 
325
  - terminate,
326
  - make a call to a library I/O function,
327
+ - perform an access through a volatile glvalue, or
328
  - perform a synchronization operation or an atomic operation.
329
 
330
+ [*Note 1*: This is intended to allow compiler transformations such as
331
+ removal of empty loops, even when termination cannot be
332
+ proven. — *end note*]
333
+
334
+ Executions of atomic functions that are either defined to be lock-free (
335
+ [[atomics.flag]]) or indicated as lock-free ([[atomics.lockfree]]) are
336
+ *lock-free executions*.
337
+
338
+ - If there is only one thread that is not blocked ([[defns.block]]) in
339
+ a standard library function, a lock-free execution in that thread
340
+ shall complete. \[*Note 2*: Concurrently executing threads may prevent
341
+ progress of a lock-free execution. For example, this situation can
342
+ occur with load-locked store-conditional implementations. This
343
+ property is sometimes termed obstruction-free. — *end note*]
344
+ - When one or more lock-free executions run concurrently, at least one
345
+ should complete. \[*Note 3*: It is difficult for some implementations
346
+ to provide absolute guarantees to this effect, since repeated and
347
+ particularly inopportune interference from other threads may prevent
348
+ forward progress, e.g., by repeatedly stealing a cache line for
349
+ unrelated purposes between load-locked and store-conditional
350
+ instructions. Implementations should ensure that such effects cannot
351
+ indefinitely delay progress under expected operating conditions, and
352
+ that such anomalies can therefore safely be ignored by programmers.
353
+ Outside this document, this property is sometimes termed
354
+ lock-free. — *end note*]
355
+
356
+ During the execution of a thread of execution, each of the following is
357
+ termed an *execution step*:
358
+
359
+ - termination of the thread of execution,
360
+ - performing an access through a volatile glvalue, or
361
+ - completion of a call to a library I/O function, a synchronization
362
+ operation, or an atomic operation.
363
+
364
+ An invocation of a standard library function that blocks (
365
+ [[defns.block]]) is considered to continuously execute execution steps
366
+ while waiting for the condition that it blocks on to be satisfied.
367
+
368
+ [*Example 1*: A library I/O function that blocks until the I/O
369
+ operation is complete can be considered to continuously check whether
370
+ the operation is complete. Each such check might consist of one or more
371
+ execution steps, for example using observable behavior of the abstract
372
+ machine. — *end example*]
373
+
374
+ [*Note 4*: Because of this and the preceding requirement regarding what
375
+ threads of execution have to perform eventually, it follows that no
376
+ thread of execution can execute forever without an execution step
377
+ occurring. — *end note*]
378
+
379
+ A thread of execution *makes progress* when an execution step occurs or
380
+ a lock-free execution does not complete because there are other
381
+ concurrent threads that are not blocked in a standard library function
382
+ (see above).
383
+
384
+ For a thread of execution providing *concurrent forward progress
385
+ guarantees*, the implementation ensures that the thread will eventually
386
+ make progress for as long as it has not terminated.
387
+
388
+ [*Note 5*: This is required regardless of whether or not other threads
389
+ of executions (if any) have been or are making progress. To eventually
390
+ fulfill this requirement means that this will happen in an unspecified
391
+ but finite amount of time. — *end note*]
392
+
393
+ It is *implementation-defined* whether the implementation-created thread
394
+ of execution that executes `main` ([[basic.start.main]]) and the
395
+ threads of execution created by `std::thread` ([[thread.thread.class]])
396
+ provide concurrent forward progress guarantees.
397
+
398
+ [*Note 6*: General-purpose implementations are encouraged to provide
399
+ these guarantees. — *end note*]
400
+
401
+ For a thread of execution providing *parallel forward progress
402
+ guarantees*, the implementation is not required to ensure that the
403
+ thread will eventually make progress if it has not yet executed any
404
+ execution step; once this thread has executed a step, it provides
405
+ concurrent forward progress guarantees.
406
+
407
+ [*Note 7*: This does not specify a requirement for when to start this
408
+ thread of execution, which will typically be specified by the entity
409
+ that creates this thread of execution. For example, a thread of
410
+ execution that provides concurrent forward progress guarantees and
411
+ executes tasks from a set of tasks in an arbitrary order, one after the
412
+ other, satisfies the requirements of parallel forward progress for these
413
+ tasks. — *end note*]
414
+
415
+ For a thread of execution providing *weakly parallel forward progress
416
+ guarantees*, the implementation does not ensure that the thread will
417
+ eventually make progress.
418
+
419
+ [*Note 8*: Threads of execution providing weakly parallel forward
420
+ progress guarantees cannot be expected to make progress regardless of
421
+ whether other threads make progress or not; however, blocking with
422
+ forward progress guarantee delegation, as defined below, can be used to
423
+ ensure that such threads of execution make progress
424
+ eventually. — *end note*]
425
+
426
+ Concurrent forward progress guarantees are stronger than parallel
427
+ forward progress guarantees, which in turn are stronger than weakly
428
+ parallel forward progress guarantees.
429
+
430
+ [*Note 9*: For example, some kinds of synchronization between threads
431
+ of execution may only make progress if the respective threads of
432
+ execution provide parallel forward progress guarantees, but will fail to
433
+ make progress under weakly parallel guarantees. — *end note*]
434
+
435
+ When a thread of execution *P* is specified to *block with forward
436
+ progress guarantee delegation* on the completion of a set *S* of threads
437
+ of execution, then throughout the whole time of *P* being blocked on
438
+ *S*, the implementation shall ensure that the forward progress
439
+ guarantees provided by at least one thread of execution in *S* is at
440
+ least as strong as *P*’s forward progress guarantees.
441
+
442
+ [*Note 10*: It is unspecified which thread or threads of execution in
443
+ *S* are chosen and for which number of execution steps. The
444
+ strengthening is not permanent and not necessarily in place for the rest
445
+ of the lifetime of the affected thread of execution. As long as *P* is
446
+ blocked, the implementation has to eventually select and potentially
447
+ strengthen a thread of execution in *S*. — *end note*]
448
+
449
+ Once a thread of execution in *S* terminates, it is removed from *S*.
450
+ Once *S* is empty, *P* is unblocked.
451
+
452
+ [*Note 11*: A thread of execution *B* thus can temporarily provide an
453
+ effectively stronger forward progress guarantee for a certain amount of
454
+ time, due to a second thread of execution *A* being blocked on it with
455
+ forward progress guarantee delegation. In turn, if *B* then blocks with
456
+ forward progress guarantee delegation on *C*, this may also temporarily
457
+ provide a stronger forward progress guarantee to *C*. — *end note*]
458
+
459
+ [*Note 12*: If all threads of execution in *S* finish executing (e.g.,
460
+ they terminate and do not use blocking synchronization incorrectly),
461
+ then *P*’s execution of the operation that blocks with forward progress
462
+ guarantee delegation will not result in *P*’s progress guarantee being
463
+ effectively weakened. — *end note*]
464
+
465
+ [*Note 13*: This does not remove any constraints regarding blocking
466
+ synchronization for threads of execution providing parallel or weakly
467
+ parallel forward progress guarantees because the implementation is not
468
+ required to strengthen a particular thread of execution whose too-weak
469
+ progress guarantee is preventing overall progress. — *end note*]
470
 
471
  An implementation should ensure that the last value (in modification
472
  order) assigned by an atomic or synchronization operation will become
473
  visible to all other threads in a finite period of time.
474