From Jason Turner

[algorithms.parallel.exec]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpj9rrdd_n/{from.md → to.md} +65 -62
tmp/tmpj9rrdd_n/{from.md → to.md} RENAMED
@@ -1,46 +1,76 @@
1
  ### Effect of execution policies on algorithm execution <a id="algorithms.parallel.exec">[[algorithms.parallel.exec]]</a>
2
 
3
- Parallel algorithms have template parameters named `ExecutionPolicy` (
4
- [[execpol]]) which describe the manner in which the execution of these
5
  algorithms may be parallelized and the manner in which they apply the
6
  element access functions.
7
 
 
 
 
 
 
 
 
 
 
8
  Unless otherwise stated, implementations may make arbitrary copies of
9
  elements (with type `T`) from sequences where
10
  `is_trivially_copy_constructible_v<T>` and
11
  `is_trivially_destructible_v<T>` are `true`.
12
 
13
- [*Note 1*: This implies that user-supplied function objects should not
14
  rely on object identity of arguments for such input sequences. Users for
15
  whom the object identity of the arguments to these function objects is
16
  important should consider using a wrapping iterator that returns a
17
- non-copied implementation object such as `reference_wrapper<T>` (
18
- [[refwrap]]) or some equivalent solution. — *end note*]
19
 
20
  The invocations of element access functions in parallel algorithms
21
  invoked with an execution policy object of type
22
  `execution::sequenced_policy` all occur in the calling thread of
23
  execution.
24
 
25
- [*Note 2*: The invocations are not interleaved; see 
26
  [[intro.execution]]. — *end note*]
27
 
28
  The invocations of element access functions in parallel algorithms
29
  invoked with an execution policy object of type
30
- `execution::parallel_policy` are permitted to execute in either the
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  invoking thread of execution or in a thread of execution implicitly
32
  created by the library to support parallel algorithm execution. If the
33
- threads of execution created by `thread` ([[thread.thread.class]])
34
- provide concurrent forward progress guarantees ([[intro.progress]]),
35
- then a thread of execution implicitly created by the library will
36
- provide parallel forward progress guarantees; otherwise, the provided
37
- forward progress guarantee is *implementation-defined*. Any such
38
- invocations executing in the same thread of execution are
39
- indeterminately sequenced with respect to each other.
 
40
 
41
- [*Note 3*: It is the caller’s responsibility to ensure that the
42
  invocation does not introduce data races or deadlocks. — *end note*]
43
 
44
  [*Example 1*:
45
 
46
  ``` cpp
@@ -60,13 +90,13 @@ to the container `v`.
60
 
61
  ``` cpp
62
  std::atomic<int> x{0};
63
  int a[] = {1,2};
64
  std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int) {
65
- x.fetch_add(1, std::memory_order_relaxed);
66
  // spin wait for another iteration to change the value of x
67
- while (x.load(std::memory_order_relaxed) == 1) { } // incorrect: assumes execution order
68
  });
69
  ```
70
 
71
  The above example depends on the order of execution of the iterations,
72
  and will not terminate if both iterations are executed sequentially on
@@ -90,78 +120,51 @@ The above example synchronizes access to object `x` ensuring that it is
90
  incremented correctly.
91
 
92
  — *end example*]
93
 
94
  The invocations of element access functions in parallel algorithms
95
- invoked with an execution policy of type
96
  `execution::parallel_unsequenced_policy` are permitted to execute in an
97
  unordered fashion in unspecified threads of execution, and unsequenced
98
  with respect to one another within each thread of execution. These
99
  threads of execution are either the invoking thread of execution or
100
  threads of execution implicitly created by the library; the latter will
101
  provide weakly parallel forward progress guarantees.
102
 
103
- [*Note 4*: This means that multiple function object invocations may be
104
  interleaved on a single thread of execution, which overrides the usual
105
  guarantee from [[intro.execution]] that function executions do not
106
- interleave with one another. — *end note*]
107
 
108
- Since `execution::parallel_unsequenced_policy` allows the execution of
109
- element access functions to be interleaved on a single thread of
110
- execution, blocking synchronization, including the use of mutexes, risks
111
- deadlock. Thus, the synchronization with
112
- `execution::parallel_unsequenced_policy` is restricted as follows: A
113
- standard library function is *vectorization-unsafe* if it is specified
114
- to synchronize with another function invocation, or another function
115
- invocation is specified to synchronize with it, and if it is not a
116
- memory allocation or deallocation function. Vectorization-unsafe
117
- standard library functions may not be invoked by user code called from
118
- `execution::parallel_unsequenced_policy` algorithms.
119
 
120
- [*Note 5*: Implementations must ensure that internal synchronization
121
- inside standard library functions does not prevent forward progress when
122
- those functions are executed by threads of execution with weakly
123
- parallel forward progress guarantees. — *end note*]
124
 
125
- [*Example 4*:
126
-
127
- ``` cpp
128
- int x = 0;
129
- std::mutex m;
130
- int a[] = {1,2};
131
- std::for_each(std::execution::par_unseq, std::begin(a), std::end(a), [&](int) {
132
- std::lock_guard<mutex> guard(m); // incorrect: lock_guard constructor calls m.lock()
133
- ++x;
134
- });
135
- ```
136
-
137
- The above program may result in two consecutive calls to `m.lock()` on
138
- the same thread of execution (which may deadlock), because the
139
- applications of the function object are not guaranteed to run on
140
- different threads of execution.
141
-
142
- — *end example*]
143
-
144
- [*Note 6*: The semantics of the `execution::parallel_policy` or the
145
- `execution::parallel_unsequenced_policy` invocation allow the
146
- implementation to fall back to sequential execution if the system cannot
147
- parallelize an algorithm invocation due to lack of
148
- resources. — *end note*]
149
 
150
  If an invocation of a parallel algorithm uses threads of execution
151
  implicitly created by the library, then the invoking thread of execution
152
  will either
153
 
154
- - temporarily block with forward progress guarantee delegation (
155
- [[intro.progress]]) on the completion of these library-managed threads
156
  of execution, or
157
  - eventually execute an element access function;
158
 
159
  the thread of execution will continue to do so until the algorithm is
160
  finished.
161
 
162
- [*Note 7*: In blocking with forward progress guarantee delegation in
163
  this context, a thread of execution created by the library is considered
164
  to have finished execution as soon as it has finished the execution of
165
  the particular element access function that the invoking thread of
166
  execution logically depends on. — *end note*]
167
 
 
1
  ### Effect of execution policies on algorithm execution <a id="algorithms.parallel.exec">[[algorithms.parallel.exec]]</a>
2
 
3
+ Parallel algorithms have template parameters named `ExecutionPolicy`
4
+ [[execpol]] which describe the manner in which the execution of these
5
  algorithms may be parallelized and the manner in which they apply the
6
  element access functions.
7
 
8
+ If an object is modified by an element access function, the algorithm
9
+ will perform no other unsynchronized accesses to that object. The
10
+ modifying element access functions are those which are specified as
11
+ modifying the object.
12
+
13
+ [*Note 1*: For example, `swap`, `++`, `--`, `@=`, and assignments
14
+ modify the object. For the assignment and `@=` operators, only the left
15
+ argument is modified. — *end note*]
16
+
17
  Unless otherwise stated, implementations may make arbitrary copies of
18
  elements (with type `T`) from sequences where
19
  `is_trivially_copy_constructible_v<T>` and
20
  `is_trivially_destructible_v<T>` are `true`.
21
 
22
+ [*Note 2*: This implies that user-supplied function objects should not
23
  rely on object identity of arguments for such input sequences. Users for
24
  whom the object identity of the arguments to these function objects is
25
  important should consider using a wrapping iterator that returns a
26
+ non-copied implementation object such as `reference_wrapper<T>`
27
+ [[refwrap]] or some equivalent solution. — *end note*]
28
 
29
  The invocations of element access functions in parallel algorithms
30
  invoked with an execution policy object of type
31
  `execution::sequenced_policy` all occur in the calling thread of
32
  execution.
33
 
34
+ [*Note 3*: The invocations are not interleaved; see 
35
  [[intro.execution]]. — *end note*]
36
 
37
  The invocations of element access functions in parallel algorithms
38
  invoked with an execution policy object of type
39
+ `execution::unsequenced_policy` are permitted to execute in an unordered
40
+ fashion in the calling thread of execution, unsequenced with respect to
41
+ one another in the calling thread of execution.
42
+
43
+ [*Note 4*: This means that multiple function object invocations may be
44
+ interleaved on a single thread of execution, which overrides the usual
45
+ guarantee from [[intro.execution]] that function executions do not
46
+ overlap with one another. — *end note*]
47
+
48
+ The behavior of a program is undefined if it invokes a
49
+ vectorization-unsafe standard library function from user code called
50
+ from a `execution::unsequenced_policy` algorithm.
51
+
52
+ [*Note 5*: Because `execution::unsequenced_policy` allows the execution
53
+ of element access functions to be interleaved on a single thread of
54
+ execution, blocking synchronization, including the use of mutexes, risks
55
+ deadlock. — *end note*]
56
+
57
+ The invocations of element access functions in parallel algorithms
58
+ invoked with an execution policy object of type
59
+ `execution::parallel_policy` are permitted to execute either in the
60
  invoking thread of execution or in a thread of execution implicitly
61
  created by the library to support parallel algorithm execution. If the
62
+ threads of execution created by `thread` [[thread.thread.class]] or
63
+ `jthread` [[thread.jthread.class]] provide concurrent forward progress
64
+ guarantees [[intro.progress]], then a thread of execution implicitly
65
+ created by the library will provide parallel forward progress
66
+ guarantees; otherwise, the provided forward progress guarantee is
67
+ *implementation-defined*. Any such invocations executing in the same
68
+ thread of execution are indeterminately sequenced with respect to each
69
+ other.
70
 
71
+ [*Note 6*: It is the caller’s responsibility to ensure that the
72
  invocation does not introduce data races or deadlocks. — *end note*]
73
 
74
  [*Example 1*:
75
 
76
  ``` cpp
 
90
 
91
  ``` cpp
92
  std::atomic<int> x{0};
93
  int a[] = {1,2};
94
  std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int) {
95
+ x.fetch_add(1, std::memory_order::relaxed);
96
  // spin wait for another iteration to change the value of x
97
+ while (x.load(std::memory_order::relaxed) == 1) { } // incorrect: assumes execution order
98
  });
99
  ```
100
 
101
  The above example depends on the order of execution of the iterations,
102
  and will not terminate if both iterations are executed sequentially on
 
120
  incremented correctly.
121
 
122
  — *end example*]
123
 
124
  The invocations of element access functions in parallel algorithms
125
+ invoked with an execution policy object of type
126
  `execution::parallel_unsequenced_policy` are permitted to execute in an
127
  unordered fashion in unspecified threads of execution, and unsequenced
128
  with respect to one another within each thread of execution. These
129
  threads of execution are either the invoking thread of execution or
130
  threads of execution implicitly created by the library; the latter will
131
  provide weakly parallel forward progress guarantees.
132
 
133
+ [*Note 7*: This means that multiple function object invocations may be
134
  interleaved on a single thread of execution, which overrides the usual
135
  guarantee from [[intro.execution]] that function executions do not
136
+ overlap with one another. — *end note*]
137
 
138
+ The behavior of a program is undefined if it invokes a
139
+ vectorization-unsafe standard library function from user code called
140
+ from a `execution::parallel_unsequenced_policy` algorithm.
 
 
 
 
 
 
 
 
141
 
142
+ [*Note 8*: Because `execution::parallel_unsequenced_policy` allows the
143
+ execution of element access functions to be interleaved on a single
144
+ thread of execution, blocking synchronization, including the use of
145
+ mutexes, risks deadlock. — *end note*]
146
 
147
+ [*Note 9*: The semantics of invocation with
148
+ `execution::unsequenced_policy`, `execution::parallel_policy`, or
149
+ `execution::parallel_unsequenced_policy` allow the implementation to
150
+ fall back to sequential execution if the system cannot parallelize an
151
+ algorithm invocation, e.g., due to lack of resources. — *end note*]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
 
153
  If an invocation of a parallel algorithm uses threads of execution
154
  implicitly created by the library, then the invoking thread of execution
155
  will either
156
 
157
+ - temporarily block with forward progress guarantee delegation
158
+ [[intro.progress]] on the completion of these library-managed threads
159
  of execution, or
160
  - eventually execute an element access function;
161
 
162
  the thread of execution will continue to do so until the algorithm is
163
  finished.
164
 
165
+ [*Note 10*: In blocking with forward progress guarantee delegation in
166
  this context, a thread of execution created by the library is considered
167
  to have finished execution as soon as it has finished the execution of
168
  the particular element access function that the invoking thread of
169
  execution logically depends on. — *end note*]
170