From Jason Turner

[intro.multithread]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpdb8kobds/{from.md → to.md} +179 -175
tmp/tmpdb8kobds/{from.md → to.md} RENAMED
@@ -1,6 +1,6 @@
1
- ## Multi-threaded executions and data races <a id="intro.multithread">[[intro.multithread]]</a>
2
 
3
  A *thread of execution* (also known as a *thread*) is a single flow of
4
  control within a program, including the initial invocation of a specific
5
  top-level function, and recursively including every function invocation
6
  subsequently executed by the thread.
@@ -8,15 +8,15 @@ subsequently executed by the thread.
8
  [*Note 1*: When one thread creates another, the initial call to the
9
  top-level function of the new thread is executed by the new thread, not
10
  by the creating thread. — *end note*]
11
 
12
  Every thread in a program can potentially access every object and
13
- function in a program.[^11] Under a hosted implementation, a C++program
14
  can have more than one thread running concurrently. The execution of
15
- each thread proceeds as defined by the remainder of this International
16
- Standard. The execution of the entire program consists of an execution
17
- of all of its threads.
18
 
19
  [*Note 2*: Usually the execution can be viewed as an interleaving of
20
  all its threads. However, some kinds of atomic operations, for example,
21
  allow executions inconsistent with a simple interleaving, as described
22
  below. — *end note*]
@@ -26,71 +26,67 @@ whether a program can have more than one thread of execution.
26
 
27
  For a signal handler that is not executed as a result of a call to the
28
  `std::raise` function, it is unspecified which thread of execution
29
  contains the signal handler invocation.
30
 
31
- ### Data races <a id="intro.races">[[intro.races]]</a>
32
 
33
- The value of an object visible to a thread *T* at a particular point is
34
- the initial value of the object, a value assigned to the object by *T*,
35
- or a value assigned to the object by another thread, according to the
36
- rules below.
37
 
38
  [*Note 1*: In some cases, there may instead be undefined behavior. Much
39
- of this section is motivated by the desire to support atomic operations
40
- with explicit and detailed visibility constraints. However, it also
41
- implicitly supports a simpler view for more restricted
42
  programs. — *end note*]
43
 
44
  Two expression evaluations *conflict* if one of them modifies a memory
45
- location ([[intro.memory]]) and the other one reads or modifies the
46
- same memory location.
47
 
48
- The library defines a number of atomic operations (Clause  [[atomics]])
49
- and operations on mutexes (Clause  [[thread]]) that are specially
50
- identified as synchronization operations. These operations play a
51
- special role in making assignments in one thread visible to another. A
52
- synchronization operation on one or more memory locations is either a
53
- consume operation, an acquire operation, a release operation, or both an
54
- acquire and release operation. A synchronization operation without an
55
- associated memory location is a fence and can be either an acquire
56
- fence, a release fence, or both an acquire and release fence. In
57
- addition, there are relaxed atomic operations, which are not
58
- synchronization operations, and atomic read-modify-write operations,
59
- which have special characteristics.
60
 
61
  [*Note 2*: For example, a call that acquires a mutex will perform an
62
  acquire operation on the locations comprising the mutex.
63
  Correspondingly, a call that releases the same mutex will perform a
64
  release operation on those same locations. Informally, performing a
65
- release operation on *A* forces prior side effects on other memory
66
  locations to become visible to other threads that later perform a
67
- consume or an acquire operation on *A*. “Relaxed” atomic operations are
68
  not synchronization operations even though, like synchronization
69
  operations, they cannot contribute to data races. — *end note*]
70
 
71
- All modifications to a particular atomic object *M* occur in some
72
- particular total order, called the *modification order* of *M*.
73
 
74
  [*Note 3*: There is a separate order for each atomic object. There is
75
  no requirement that these can be combined into a single total order for
76
  all objects. In general this will be impossible since different threads
77
  may observe modifications to different objects in inconsistent
78
  orders. — *end note*]
79
 
80
- A *release sequence* headed by a release operation *A* on an atomic
81
- object *M* is a maximal contiguous sub-sequence of side effects in the
82
- modification order of *M*, where the first operation is `A`, and every
83
- subsequent operation
84
-
85
- - is performed by the same thread that performed `A`, or
86
- - is an atomic read-modify-write operation.
87
 
88
  Certain library calls *synchronize with* other library calls performed
89
  by another thread. For example, an atomic store-release synchronizes
90
- with a load-acquire that takes its value from the store (
91
- [[atomics.order]]).
92
 
93
  [*Note 4*: Except in the specified cases, reading a later value does
94
  not necessarily ensure visibility as described below. Such a requirement
95
  would sometimes interfere with efficient implementation. — *end note*]
96
 
@@ -98,55 +94,52 @@ would sometimes interfere with efficient implementation. — *end note*]
98
  when one reads the value written by another. For atomic objects, the
99
  definition is clear. All operations on a given mutex occur in a single
100
  total order. Each mutex acquisition “reads the value written” by the
101
  last mutex release. — *end note*]
102
 
103
- An evaluation *A* *carries a dependency* to an evaluation *B* if
104
 
105
- - the value of *A* is used as an operand of *B*, unless:
106
- - *B* is an invocation of any specialization of
107
- `std::kill_dependency` ([[atomics.order]]), or
108
- - *A* is the left operand of a built-in logical AND (`&&`, see 
109
- [[expr.log.and]]) or logical OR (`||`, see  [[expr.log.or]])
110
- operator, or
111
- - *A* is the left operand of a conditional (`?:`, see  [[expr.cond]])
112
  operator, or
113
- - *A* is the left operand of the built-in comma (`,`) operator (
114
- [[expr.comma]]);
115
 
116
  or
117
- - *A* writes a scalar object or bit-field *M*, *B* reads the value
118
- written by *A* from *M*, and *A* is sequenced before *B*, or
119
- - for some evaluation *X*, *A* carries a dependency to *X*, and *X*
120
- carries a dependency to *B*.
121
 
122
  [*Note 6*: “Carries a dependency to” is a subset of “is sequenced
123
  before”, and is similarly strictly intra-thread. — *end note*]
124
 
125
- An evaluation *A* is *dependency-ordered before* an evaluation *B* if
126
 
127
- - *A* performs a release operation on an atomic object *M*, and, in
128
- another thread, *B* performs a consume operation on *M* and reads a
129
- value written by any side effect in the release sequence headed by
130
- *A*, or
131
- - for some evaluation *X*, *A* is dependency-ordered before *X* and *X*
132
- carries a dependency to *B*.
133
 
134
  [*Note 7*: The relation “is dependency-ordered before” is analogous to
135
  “synchronizes with”, but uses release/consume in place of
136
  release/acquire. — *end note*]
137
 
138
- An evaluation *A* *inter-thread happens before* an evaluation *B* if
139
 
140
- - *A* synchronizes with *B*, or
141
- - *A* is dependency-ordered before *B*, or
142
- - for some evaluation *X*
143
- - *A* synchronizes with *X* and *X* is sequenced before *B*, or
144
- - *A* is sequenced before *X* and *X* inter-thread happens before *B*,
145
- or
146
- - *A* inter-thread happens before *X* and *X* inter-thread happens
147
- before *B*.
148
 
149
  [*Note 8*: The “inter-thread happens before” relation describes
150
  arbitrary concatenations of “sequenced before”, “synchronizes with” and
151
  “dependency-ordered before” relationships, with two exceptions. The
152
  first exception is that a concatenation is not permitted to end with
@@ -161,101 +154,111 @@ exception is that a concatenation is not permitted to consist entirely
161
  of “sequenced before”. The reasons for this limitation are (1) to permit
162
  “inter-thread happens before” to be transitively closed and (2) the
163
  “happens before” relation, defined below, provides for relationships
164
  consisting entirely of “sequenced before”. — *end note*]
165
 
166
- An evaluation *A* *happens before* an evaluation *B* (or, equivalently,
167
- *B* *happens after* *A*) if:
168
 
169
- - *A* is sequenced before *B*, or
170
- - *A* inter-thread happens before *B*.
171
 
172
  The implementation shall ensure that no program execution demonstrates a
173
  cycle in the “happens before” relation.
174
 
175
  [*Note 9*: This cycle would otherwise be possible only through the use
176
  of consume operations. — *end note*]
177
 
178
- An evaluation *A* *strongly happens before* an evaluation *B* if either
179
 
180
- - *A* is sequenced before *B*, or
181
- - *A* synchronizes with *B*, or
182
- - *A* strongly happens before *X* and *X* strongly happens before *B*.
183
 
184
  [*Note 10*: In the absence of consume operations, the happens before
185
- and strongly happens before relations are identical. Strongly happens
186
- before essentially excludes consume operations. — *end note*]
187
 
188
- A *visible side effect* *A* on a scalar object or bit-field *M* with
189
- respect to a value computation *B* of *M* satisfies the conditions:
190
 
191
- - *A* happens before *B* and
192
- - there is no other side effect *X* to *M* such that *A* happens before
193
- *X* and *X* happens before *B*.
 
 
 
 
194
 
195
- The value of a non-atomic scalar object or bit-field *M*, as determined
196
- by evaluation *B*, shall be the value stored by the visible side effect
197
- *A*.
198
 
199
- [*Note 11*: If there is ambiguity about which side effect to a
 
 
 
 
 
 
 
 
 
 
200
  non-atomic object or bit-field is visible, then the behavior is either
201
  unspecified or undefined. — *end note*]
202
 
203
- [*Note 12*: This states that operations on ordinary objects are not
204
  visibly reordered. This is not actually detectable without data races,
205
  but it is necessary to ensure that data races, as defined below, and
206
  with suitable restrictions on the use of atomics, correspond to data
207
  races in a simple interleaved (sequentially consistent)
208
  execution. — *end note*]
209
 
210
- The value of an atomic object *M*, as determined by evaluation *B*,
211
- shall be the value stored by some side effect *A* that modifies *M*,
212
- where *B* does not happen before *A*.
213
 
214
- [*Note 13*: The set of such side effects is also restricted by the rest
215
  of the rules described here, and in particular, by the coherence
216
  requirements below. — *end note*]
217
 
218
- If an operation *A* that modifies an atomic object *M* happens before an
219
- operation *B* that modifies *M*, then *A* shall be earlier than *B* in
220
- the modification order of *M*.
221
 
222
- [*Note 14*: This requirement is known as write-write
223
  coherence. — *end note*]
224
 
225
- If a value computation *A* of an atomic object *M* happens before a
226
- value computation *B* of *M*, and *A* takes its value from a side effect
227
- *X* on *M*, then the value computed by *B* shall either be the value
228
- stored by *X* or the value stored by a side effect *Y* on *M*, where *Y*
229
- follows *X* in the modification order of *M*.
230
 
231
- [*Note 15*: This requirement is known as read-read
232
  coherence. — *end note*]
233
 
234
- If a value computation *A* of an atomic object *M* happens before an
235
- operation *B* that modifies *M*, then *A* shall take its value from a
236
- side effect *X* on *M*, where *X* precedes *B* in the modification order
237
- of *M*.
238
 
239
- [*Note 16*: This requirement is known as read-write
240
  coherence. — *end note*]
241
 
242
- If a side effect *X* on an atomic object *M* happens before a value
243
- computation *B* of *M*, then the evaluation *B* shall take its value
244
- from *X* or from a side effect *Y* that follows *X* in the modification
245
- order of *M*.
246
 
247
- [*Note 17*: This requirement is known as write-read
248
  coherence. — *end note*]
249
 
250
- [*Note 18*: The four preceding coherence requirements effectively
251
  disallow compiler reordering of atomic operations to a single object,
252
  even if both operations are relaxed loads. This effectively makes the
253
- cache coherence guarantee provided by most hardware available to
254
- C++atomic operations. — *end note*]
255
 
256
- [*Note 19*: The value observed by a load of an atomic depends on the
257
  “happens before” relation, which depends on the values observed by loads
258
  of atomics. The intended reading is that there must exist an association
259
  of atomic loads with modifications they observe that, together with
260
  suitably chosen modification orders and the “happens before” relation
261
  derived as described above, satisfy the resulting constraints as imposed
@@ -271,12 +274,12 @@ The execution of a program contains a *data race* if it contains two
271
  potentially concurrent conflicting actions, at least one of which is not
272
  atomic, and neither happens before the other, except for the special
273
  case for signal handlers described below. Any such data race results in
274
  undefined behavior.
275
 
276
- [*Note 20*: It can be shown that programs that correctly use mutexes
277
- and `memory_order_seq_cst` operations to prevent all data races and use
278
  no other synchronization operations behave as if the operations executed
279
  by their constituent threads were simply interleaved, with each value
280
  computation of an object being taken from the last side effect on that
281
  object in that interleaving. This is normally referred to as “sequential
282
  consistency”. However, this applies only to data-race-free programs, and
@@ -288,38 +291,38 @@ undefined operation. — *end note*]
288
 
289
  Two accesses to the same object of type `volatile std::sig_atomic_t` do
290
  not result in a data race if both occur in the same thread, even if one
291
  or more occurs in a signal handler. For each signal handler invocation,
292
  evaluations performed by the thread invoking a signal handler can be
293
- divided into two groups *A* and *B*, such that no evaluations in *B*
294
- happen before evaluations in *A*, and the evaluations of such
295
  `volatile std::sig_atomic_t` objects take values as though all
296
- evaluations in *A* happened before the execution of the signal handler
297
- and the execution of the signal handler happened before all evaluations
298
- in *B*.
299
 
300
- [*Note 21*: Compiler transformations that introduce assignments to a
301
  potentially shared memory location that would not be modified by the
302
- abstract machine are generally precluded by this International Standard,
303
- since such an assignment might overwrite another assignment by a
304
- different thread in cases in which an abstract machine execution would
305
- not have encountered a data race. This includes implementations of data
306
- member assignment that overwrite adjacent members in separate memory
307
- locations. Reordering of atomic loads in cases in which the atomics in
308
- question may alias is also generally precluded, since this may violate
309
- the coherence rules. — *end note*]
310
 
311
- [*Note 22*: Transformations that introduce a speculative read of a
312
  potentially shared memory location may not preserve the semantics of the
313
- C++program as defined in this International Standard, since they
314
- potentially introduce a data race. However, they are typically valid in
315
- the context of an optimizing compiler that targets a specific machine
316
- with well-defined semantics for data races. They would be invalid for a
317
  hypothetical machine that is not tolerant of races or provides hardware
318
  race detection. — *end note*]
319
 
320
- ### Forward progress <a id="intro.progress">[[intro.progress]]</a>
321
 
322
  The implementation may assume that any thread will eventually do one of
323
  the following:
324
 
325
  - terminate,
@@ -329,17 +332,17 @@ the following:
329
 
330
  [*Note 1*: This is intended to allow compiler transformations such as
331
  removal of empty loops, even when termination cannot be
332
  proven. — *end note*]
333
 
334
- Executions of atomic functions that are either defined to be lock-free (
335
- [[atomics.flag]]) or indicated as lock-free ([[atomics.lockfree]]) are
336
  *lock-free executions*.
337
 
338
- - If there is only one thread that is not blocked ([[defns.block]]) in
339
- a standard library function, a lock-free execution in that thread
340
- shall complete. \[*Note 2*: Concurrently executing threads may prevent
341
  progress of a lock-free execution. For example, this situation can
342
  occur with load-locked store-conditional implementations. This
343
  property is sometimes termed obstruction-free. — *end note*]
344
  - When one or more lock-free executions run concurrently, at least one
345
  should complete. \[*Note 3*: It is difficult for some implementations
@@ -359,13 +362,13 @@ termed an *execution step*:
359
  - termination of the thread of execution,
360
  - performing an access through a volatile glvalue, or
361
  - completion of a call to a library I/O function, a synchronization
362
  operation, or an atomic operation.
363
 
364
- An invocation of a standard library function that blocks (
365
- [[defns.block]]) is considered to continuously execute execution steps
366
- while waiting for the condition that it blocks on to be satisfied.
367
 
368
  [*Example 1*: A library I/O function that blocks until the I/O
369
  operation is complete can be considered to continuously check whether
370
  the operation is complete. Each such check might consist of one or more
371
  execution steps, for example using observable behavior of the abstract
@@ -389,16 +392,17 @@ make progress for as long as it has not terminated.
389
  of executions (if any) have been or are making progress. To eventually
390
  fulfill this requirement means that this will happen in an unspecified
391
  but finite amount of time. — *end note*]
392
 
393
  It is *implementation-defined* whether the implementation-created thread
394
- of execution that executes `main` ([[basic.start.main]]) and the
395
- threads of execution created by `std::thread` ([[thread.thread.class]])
396
- provide concurrent forward progress guarantees.
 
397
 
398
- [*Note 6*: General-purpose implementations are encouraged to provide
399
- these guarantees. — *end note*]
400
 
401
  For a thread of execution providing *parallel forward progress
402
  guarantees*, the implementation is not required to ensure that the
403
  thread will eventually make progress if it has not yet executed any
404
  execution step; once this thread has executed a step, it provides
@@ -430,38 +434,38 @@ parallel forward progress guarantees.
430
  [*Note 9*: For example, some kinds of synchronization between threads
431
  of execution may only make progress if the respective threads of
432
  execution provide parallel forward progress guarantees, but will fail to
433
  make progress under weakly parallel guarantees. — *end note*]
434
 
435
- When a thread of execution *P* is specified to *block with forward
436
- progress guarantee delegation* on the completion of a set *S* of threads
437
- of execution, then throughout the whole time of *P* being blocked on
438
- *S*, the implementation shall ensure that the forward progress
439
- guarantees provided by at least one thread of execution in *S* is at
440
- least as strong as *P*’s forward progress guarantees.
441
 
442
- [*Note 10*: It is unspecified which thread or threads of execution in
443
- *S* are chosen and for which number of execution steps. The
444
- strengthening is not permanent and not necessarily in place for the rest
445
- of the lifetime of the affected thread of execution. As long as *P* is
446
- blocked, the implementation has to eventually select and potentially
447
- strengthen a thread of execution in *S*. — *end note*]
448
 
449
- Once a thread of execution in *S* terminates, it is removed from *S*.
450
- Once *S* is empty, *P* is unblocked.
451
 
452
- [*Note 11*: A thread of execution *B* thus can temporarily provide an
453
  effectively stronger forward progress guarantee for a certain amount of
454
- time, due to a second thread of execution *A* being blocked on it with
455
- forward progress guarantee delegation. In turn, if *B* then blocks with
456
- forward progress guarantee delegation on *C*, this may also temporarily
457
- provide a stronger forward progress guarantee to *C*. — *end note*]
458
 
459
- [*Note 12*: If all threads of execution in *S* finish executing (e.g.,
460
  they terminate and do not use blocking synchronization incorrectly),
461
- then *P*’s execution of the operation that blocks with forward progress
462
- guarantee delegation will not result in *P*’s progress guarantee being
463
  effectively weakened. — *end note*]
464
 
465
  [*Note 13*: This does not remove any constraints regarding blocking
466
  synchronization for threads of execution providing parallel or weakly
467
  parallel forward progress guarantees because the implementation is not
 
1
+ ### Multi-threaded executions and data races <a id="intro.multithread">[[intro.multithread]]</a>
2
 
3
  A *thread of execution* (also known as a *thread*) is a single flow of
4
  control within a program, including the initial invocation of a specific
5
  top-level function, and recursively including every function invocation
6
  subsequently executed by the thread.
 
8
  [*Note 1*: When one thread creates another, the initial call to the
9
  top-level function of the new thread is executed by the new thread, not
10
  by the creating thread. — *end note*]
11
 
12
  Every thread in a program can potentially access every object and
13
+ function in a program.[^28] Under a hosted implementation, a C++ program
14
  can have more than one thread running concurrently. The execution of
15
+ each thread proceeds as defined by the remainder of this document. The
16
+ execution of the entire program consists of an execution of all of its
17
+ threads.
18
 
19
  [*Note 2*: Usually the execution can be viewed as an interleaving of
20
  all its threads. However, some kinds of atomic operations, for example,
21
  allow executions inconsistent with a simple interleaving, as described
22
  below. — *end note*]
 
26
 
27
  For a signal handler that is not executed as a result of a call to the
28
  `std::raise` function, it is unspecified which thread of execution
29
  contains the signal handler invocation.
30
 
31
+ #### Data races <a id="intro.races">[[intro.races]]</a>
32
 
33
+ The value of an object visible to a thread T at a particular point is
34
+ the initial value of the object, a value assigned to the object by T, or
35
+ a value assigned to the object by another thread, according to the rules
36
+ below.
37
 
38
  [*Note 1*: In some cases, there may instead be undefined behavior. Much
39
+ of this subclause is motivated by the desire to support atomic
40
+ operations with explicit and detailed visibility constraints. However,
41
+ it also implicitly supports a simpler view for more restricted
42
  programs. — *end note*]
43
 
44
  Two expression evaluations *conflict* if one of them modifies a memory
45
+ location [[intro.memory]] and the other one reads or modifies the same
46
+ memory location.
47
 
48
+ The library defines a number of atomic operations [[atomics]] and
49
+ operations on mutexes [[thread]] that are specially identified as
50
+ synchronization operations. These operations play a special role in
51
+ making assignments in one thread visible to another. A synchronization
52
+ operation on one or more memory locations is either a consume operation,
53
+ an acquire operation, a release operation, or both an acquire and
54
+ release operation. A synchronization operation without an associated
55
+ memory location is a fence and can be either an acquire fence, a release
56
+ fence, or both an acquire and release fence. In addition, there are
57
+ relaxed atomic operations, which are not synchronization operations, and
58
+ atomic read-modify-write operations, which have special characteristics.
 
59
 
60
  [*Note 2*: For example, a call that acquires a mutex will perform an
61
  acquire operation on the locations comprising the mutex.
62
  Correspondingly, a call that releases the same mutex will perform a
63
  release operation on those same locations. Informally, performing a
64
+ release operation on A forces prior side effects on other memory
65
  locations to become visible to other threads that later perform a
66
+ consume or an acquire operation on A. “Relaxed” atomic operations are
67
  not synchronization operations even though, like synchronization
68
  operations, they cannot contribute to data races. — *end note*]
69
 
70
+ All modifications to a particular atomic object M occur in some
71
+ particular total order, called the *modification order* of M.
72
 
73
  [*Note 3*: There is a separate order for each atomic object. There is
74
  no requirement that these can be combined into a single total order for
75
  all objects. In general this will be impossible since different threads
76
  may observe modifications to different objects in inconsistent
77
  orders. — *end note*]
78
 
79
+ A *release sequence* headed by a release operation A on an atomic object
80
+ M is a maximal contiguous sub-sequence of side effects in the
81
+ modification order of M, where the first operation is A, and every
82
+ subsequent operation is an atomic read-modify-write operation.
 
 
 
83
 
84
  Certain library calls *synchronize with* other library calls performed
85
  by another thread. For example, an atomic store-release synchronizes
86
+ with a load-acquire that takes its value from the store
87
+ [[atomics.order]].
88
 
89
  [*Note 4*: Except in the specified cases, reading a later value does
90
  not necessarily ensure visibility as described below. Such a requirement
91
  would sometimes interfere with efficient implementation. — *end note*]
92
 
 
94
  when one reads the value written by another. For atomic objects, the
95
  definition is clear. All operations on a given mutex occur in a single
96
  total order. Each mutex acquisition “reads the value written” by the
97
  last mutex release. — *end note*]
98
 
99
+ An evaluation A *carries a dependency* to an evaluation B if
100
 
101
+ - the value of A is used as an operand of B, unless:
102
+ - B is an invocation of any specialization of `std::kill_dependency`
103
+ [[atomics.order]], or
104
+ - A is the left operand of a built-in logical (`&&`, see 
105
+ [[expr.log.and]]) or logical (`||`, see  [[expr.log.or]]) operator,
106
+ or
107
+ - A is the left operand of a conditional (`?:`, see  [[expr.cond]])
108
  operator, or
109
+ - A is the left operand of the built-in comma (`,`) operator
110
+ [[expr.comma]];
111
 
112
  or
113
+ - A writes a scalar object or bit-field M, B reads the value written by
114
+ A from M, and A is sequenced before B, or
115
+ - for some evaluation X, A carries a dependency to X, and X carries a
116
+ dependency to B.
117
 
118
  [*Note 6*: “Carries a dependency to” is a subset of “is sequenced
119
  before”, and is similarly strictly intra-thread. — *end note*]
120
 
121
+ An evaluation A is *dependency-ordered before* an evaluation B if
122
 
123
+ - A performs a release operation on an atomic object M, and, in another
124
+ thread, B performs a consume operation on M and reads the value
125
+ written by A, or
126
+ - for some evaluation X, A is dependency-ordered before X and X carries
127
+ a dependency to B.
 
128
 
129
  [*Note 7*: The relation “is dependency-ordered before” is analogous to
130
  “synchronizes with”, but uses release/consume in place of
131
  release/acquire. — *end note*]
132
 
133
+ An evaluation A *inter-thread happens before* an evaluation B if
134
 
135
+ - A synchronizes with B, or
136
+ - A is dependency-ordered before B, or
137
+ - for some evaluation X
138
+ - A synchronizes with X and X is sequenced before B, or
139
+ - A is sequenced before X and X inter-thread happens before B, or
140
+ - A inter-thread happens before X and X inter-thread happens before B.
 
 
141
 
142
  [*Note 8*: The “inter-thread happens before” relation describes
143
  arbitrary concatenations of “sequenced before”, “synchronizes with” and
144
  “dependency-ordered before” relationships, with two exceptions. The
145
  first exception is that a concatenation is not permitted to end with
 
154
  of “sequenced before”. The reasons for this limitation are (1) to permit
155
  “inter-thread happens before” to be transitively closed and (2) the
156
  “happens before” relation, defined below, provides for relationships
157
  consisting entirely of “sequenced before”. — *end note*]
158
 
159
+ An evaluation A *happens before* an evaluation B (or, equivalently, B
160
+ *happens after* A) if:
161
 
162
+ - A is sequenced before B, or
163
+ - A inter-thread happens before B.
164
 
165
  The implementation shall ensure that no program execution demonstrates a
166
  cycle in the “happens before” relation.
167
 
168
  [*Note 9*: This cycle would otherwise be possible only through the use
169
  of consume operations. — *end note*]
170
 
171
+ An evaluation A *simply happens before* an evaluation B if either
172
 
173
+ - A is sequenced before B, or
174
+ - A synchronizes with B, or
175
+ - A simply happens before X and X simply happens before B.
176
 
177
  [*Note 10*: In the absence of consume operations, the happens before
178
+ and simply happens before relations are identical. *end note*]
 
179
 
180
+ An evaluation A *strongly happens before* an evaluation D if, either
 
181
 
182
+ - A is sequenced before D, or
183
+ - A synchronizes with D, and both A and D are sequentially consistent
184
+ atomic operations [[atomics.order]], or
185
+ - there are evaluations B and C such that A is sequenced before B, B
186
+ simply happens before C, and C is sequenced before D, or
187
+ - there is an evaluation B such that A strongly happens before B, and B
188
+ strongly happens before D.
189
 
190
+ [*Note 11*: Informally, if A strongly happens before B, then A appears
191
+ to be evaluated before B in all contexts. Strongly happens before
192
+ excludes consume operations. — *end note*]
193
 
194
+ A *visible side effect* A on a scalar object or bit-field M with respect
195
+ to a value computation B of M satisfies the conditions:
196
+
197
+ - A happens before B and
198
+ - there is no other side effect X to M such that A happens before X and
199
+ X happens before B.
200
+
201
+ The value of a non-atomic scalar object or bit-field M, as determined by
202
+ evaluation B, shall be the value stored by the visible side effect A.
203
+
204
+ [*Note 12*: If there is ambiguity about which side effect to a
205
  non-atomic object or bit-field is visible, then the behavior is either
206
  unspecified or undefined. — *end note*]
207
 
208
+ [*Note 13*: This states that operations on ordinary objects are not
209
  visibly reordered. This is not actually detectable without data races,
210
  but it is necessary to ensure that data races, as defined below, and
211
  with suitable restrictions on the use of atomics, correspond to data
212
  races in a simple interleaved (sequentially consistent)
213
  execution. — *end note*]
214
 
215
+ The value of an atomic object M, as determined by evaluation B, shall be
216
+ the value stored by some side effect A that modifies M, where B does not
217
+ happen before A.
218
 
219
+ [*Note 14*: The set of such side effects is also restricted by the rest
220
  of the rules described here, and in particular, by the coherence
221
  requirements below. — *end note*]
222
 
223
+ If an operation A that modifies an atomic object M happens before an
224
+ operation B that modifies M, then A shall be earlier than B in the
225
+ modification order of M.
226
 
227
+ [*Note 15*: This requirement is known as write-write
228
  coherence. — *end note*]
229
 
230
+ If a value computation A of an atomic object M happens before a value
231
+ computation B of M, and A takes its value from a side effect X on M,
232
+ then the value computed by B shall either be the value stored by X or
233
+ the value stored by a side effect Y on M, where Y follows X in the
234
+ modification order of M.
235
 
236
+ [*Note 16*: This requirement is known as read-read
237
  coherence. — *end note*]
238
 
239
+ If a value computation A of an atomic object M happens before an
240
+ operation B that modifies M, then A shall take its value from a side
241
+ effect X on M, where X precedes B in the modification order of M.
 
242
 
243
+ [*Note 17*: This requirement is known as read-write
244
  coherence. — *end note*]
245
 
246
+ If a side effect X on an atomic object M happens before a value
247
+ computation B of M, then the evaluation B shall take its value from X or
248
+ from a side effect Y that follows X in the modification order of M.
 
249
 
250
+ [*Note 18*: This requirement is known as write-read
251
  coherence. — *end note*]
252
 
253
+ [*Note 19*: The four preceding coherence requirements effectively
254
  disallow compiler reordering of atomic operations to a single object,
255
  even if both operations are relaxed loads. This effectively makes the
256
+ cache coherence guarantee provided by most hardware available to C++
257
+ atomic operations. — *end note*]
258
 
259
+ [*Note 20*: The value observed by a load of an atomic depends on the
260
  “happens before” relation, which depends on the values observed by loads
261
  of atomics. The intended reading is that there must exist an association
262
  of atomic loads with modifications they observe that, together with
263
  suitably chosen modification orders and the “happens before” relation
264
  derived as described above, satisfy the resulting constraints as imposed
 
274
  potentially concurrent conflicting actions, at least one of which is not
275
  atomic, and neither happens before the other, except for the special
276
  case for signal handlers described below. Any such data race results in
277
  undefined behavior.
278
 
279
+ [*Note 21*: It can be shown that programs that correctly use mutexes
280
+ and `memory_order::seq_cst` operations to prevent all data races and use
281
  no other synchronization operations behave as if the operations executed
282
  by their constituent threads were simply interleaved, with each value
283
  computation of an object being taken from the last side effect on that
284
  object in that interleaving. This is normally referred to as “sequential
285
  consistency”. However, this applies only to data-race-free programs, and
 
291
 
292
  Two accesses to the same object of type `volatile std::sig_atomic_t` do
293
  not result in a data race if both occur in the same thread, even if one
294
  or more occurs in a signal handler. For each signal handler invocation,
295
  evaluations performed by the thread invoking a signal handler can be
296
+ divided into two groups A and B, such that no evaluations in B happen
297
+ before evaluations in A, and the evaluations of such
298
  `volatile std::sig_atomic_t` objects take values as though all
299
+ evaluations in A happened before the execution of the signal handler and
300
+ the execution of the signal handler happened before all evaluations in
301
+ B.
302
 
303
+ [*Note 22*: Compiler transformations that introduce assignments to a
304
  potentially shared memory location that would not be modified by the
305
+ abstract machine are generally precluded by this document, since such an
306
+ assignment might overwrite another assignment by a different thread in
307
+ cases in which an abstract machine execution would not have encountered
308
+ a data race. This includes implementations of data member assignment
309
+ that overwrite adjacent members in separate memory locations. Reordering
310
+ of atomic loads in cases in which the atomics in question may alias is
311
+ also generally precluded, since this may violate the coherence
312
+ rules. — *end note*]
313
 
314
+ [*Note 23*: Transformations that introduce a speculative read of a
315
  potentially shared memory location may not preserve the semantics of the
316
+ C++ program as defined in this document, since they potentially
317
+ introduce a data race. However, they are typically valid in the context
318
+ of an optimizing compiler that targets a specific machine with
319
+ well-defined semantics for data races. They would be invalid for a
320
  hypothetical machine that is not tolerant of races or provides hardware
321
  race detection. — *end note*]
322
 
323
+ #### Forward progress <a id="intro.progress">[[intro.progress]]</a>
324
 
325
  The implementation may assume that any thread will eventually do one of
326
  the following:
327
 
328
  - terminate,
 
332
 
333
  [*Note 1*: This is intended to allow compiler transformations such as
334
  removal of empty loops, even when termination cannot be
335
  proven. — *end note*]
336
 
337
+ Executions of atomic functions that are either defined to be lock-free
338
+ [[atomics.flag]] or indicated as lock-free [[atomics.lockfree]] are
339
  *lock-free executions*.
340
 
341
+ - If there is only one thread that is not blocked [[defns.block]] in a
342
+ standard library function, a lock-free execution in that thread shall
343
+ complete. \[*Note 2*: Concurrently executing threads may prevent
344
  progress of a lock-free execution. For example, this situation can
345
  occur with load-locked store-conditional implementations. This
346
  property is sometimes termed obstruction-free. — *end note*]
347
  - When one or more lock-free executions run concurrently, at least one
348
  should complete. \[*Note 3*: It is difficult for some implementations
 
362
  - termination of the thread of execution,
363
  - performing an access through a volatile glvalue, or
364
  - completion of a call to a library I/O function, a synchronization
365
  operation, or an atomic operation.
366
 
367
+ An invocation of a standard library function that blocks [[defns.block]]
368
+ is considered to continuously execute execution steps while waiting for
369
+ the condition that it blocks on to be satisfied.
370
 
371
  [*Example 1*: A library I/O function that blocks until the I/O
372
  operation is complete can be considered to continuously check whether
373
  the operation is complete. Each such check might consist of one or more
374
  execution steps, for example using observable behavior of the abstract
 
392
  of executions (if any) have been or are making progress. To eventually
393
  fulfill this requirement means that this will happen in an unspecified
394
  but finite amount of time. — *end note*]
395
 
396
  It is *implementation-defined* whether the implementation-created thread
397
+ of execution that executes `main` [[basic.start.main]] and the threads
398
+ of execution created by `std::thread` [[thread.thread.class]] or
399
+ `std::jthread` [[thread.jthread.class]] provide concurrent forward
400
+ progress guarantees.
401
 
402
+ [*Note 6*: General-purpose implementations should provide these
403
+ guarantees. — *end note*]
404
 
405
  For a thread of execution providing *parallel forward progress
406
  guarantees*, the implementation is not required to ensure that the
407
  thread will eventually make progress if it has not yet executed any
408
  execution step; once this thread has executed a step, it provides
 
434
  [*Note 9*: For example, some kinds of synchronization between threads
435
  of execution may only make progress if the respective threads of
436
  execution provide parallel forward progress guarantees, but will fail to
437
  make progress under weakly parallel guarantees. — *end note*]
438
 
439
+ When a thread of execution P is specified to *block with forward
440
+ progress guarantee delegation* on the completion of a set S of threads
441
+ of execution, then throughout the whole time of P being blocked on S,
442
+ the implementation shall ensure that the forward progress guarantees
443
+ provided by at least one thread of execution in S is at least as strong
444
+ as P’s forward progress guarantees.
445
 
446
+ [*Note 10*: It is unspecified which thread or threads of execution in S
447
+ are chosen and for which number of execution steps. The strengthening is
448
+ not permanent and not necessarily in place for the rest of the lifetime
449
+ of the affected thread of execution. As long as P is blocked, the
450
+ implementation has to eventually select and potentially strengthen a
451
+ thread of execution in S. — *end note*]
452
 
453
+ Once a thread of execution in S terminates, it is removed from S. Once S
454
+ is empty, P is unblocked.
455
 
456
+ [*Note 11*: A thread of execution B thus can temporarily provide an
457
  effectively stronger forward progress guarantee for a certain amount of
458
+ time, due to a second thread of execution A being blocked on it with
459
+ forward progress guarantee delegation. In turn, if B then blocks with
460
+ forward progress guarantee delegation on C, this may also temporarily
461
+ provide a stronger forward progress guarantee to C. — *end note*]
462
 
463
+ [*Note 12*: If all threads of execution in S finish executing (e.g.,
464
  they terminate and do not use blocking synchronization incorrectly),
465
+ then P’s execution of the operation that blocks with forward progress
466
+ guarantee delegation will not result in P’s progress guarantee being
467
  effectively weakened. — *end note*]
468
 
469
  [*Note 13*: This does not remove any constraints regarding blocking
470
  synchronization for threads of execution providing parallel or weakly
471
  parallel forward progress guarantees because the implementation is not