From Jason Turner

[algorithms.parallel]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmppm29sv2y/{from.md → to.md} +105 -71
tmp/tmppm29sv2y/{from.md → to.md} RENAMED
@@ -1,15 +1,15 @@
1
  ## Parallel algorithms <a id="algorithms.parallel">[[algorithms.parallel]]</a>
2
 
3
- This section describes components that C++programs may use to perform
4
- operations on containers and other sequences in parallel.
5
 
6
- ### Terms and definitions <a id="algorithms.parallel.defns">[[algorithms.parallel.defns]]</a>
 
 
7
 
8
- A *parallel algorithm* is a function template listed in this
9
- International Standard with a template parameter named
10
- `ExecutionPolicy`.
11
 
12
  Parallel algorithms access objects indirectly accessible via their
13
  arguments by invoking the following functions:
14
 
15
  - All operations of the categories of the iterators that the algorithm
@@ -17,11 +17,11 @@ arguments by invoking the following functions:
17
  - Operations on those sequence elements that are required by its
18
  specification.
19
  - User-provided function objects to be applied during the execution of
20
  the algorithm, if required by the specification.
21
  - Operations on those function objects required by the specification.
22
- \[*Note 1*: See  [[algorithms.general]]. — *end note*]
23
 
24
  These functions are herein called *element access functions*.
25
 
26
  [*Example 1*:
27
 
@@ -34,62 +34,123 @@ The `sort` function may invoke the following element access functions:
34
  preconditions specified in [[sort]]).
35
  - The user-provided `Compare` function object.
36
 
37
  — *end example*]
38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  ### Requirements on user-provided function objects <a id="algorithms.parallel.user">[[algorithms.parallel.user]]</a>
40
 
41
  Unless otherwise specified, function objects passed into parallel
42
  algorithms as objects of type `Predicate`, `BinaryPredicate`, `Compare`,
43
  `UnaryOperation`, `BinaryOperation`, `BinaryOperation1`,
44
  `BinaryOperation2`, and the operators used by the analogous overloads to
45
  these parallel algorithms that could be formed by the invocation with
46
  the specified default predicate or operation (where applicable) shall
47
  not directly or indirectly modify objects via their arguments, nor shall
48
- they rely on the identity of the provided objects..
49
 
50
  ### Effect of execution policies on algorithm execution <a id="algorithms.parallel.exec">[[algorithms.parallel.exec]]</a>
51
 
52
- Parallel algorithms have template parameters named `ExecutionPolicy` (
53
- [[execpol]]) which describe the manner in which the execution of these
54
  algorithms may be parallelized and the manner in which they apply the
55
  element access functions.
56
 
 
 
 
 
 
 
 
 
 
57
  Unless otherwise stated, implementations may make arbitrary copies of
58
  elements (with type `T`) from sequences where
59
  `is_trivially_copy_constructible_v<T>` and
60
  `is_trivially_destructible_v<T>` are `true`.
61
 
62
- [*Note 1*: This implies that user-supplied function objects should not
63
  rely on object identity of arguments for such input sequences. Users for
64
  whom the object identity of the arguments to these function objects is
65
  important should consider using a wrapping iterator that returns a
66
- non-copied implementation object such as `reference_wrapper<T>` (
67
- [[refwrap]]) or some equivalent solution. — *end note*]
68
 
69
  The invocations of element access functions in parallel algorithms
70
  invoked with an execution policy object of type
71
  `execution::sequenced_policy` all occur in the calling thread of
72
  execution.
73
 
74
- [*Note 2*: The invocations are not interleaved; see 
75
  [[intro.execution]]. — *end note*]
76
 
77
  The invocations of element access functions in parallel algorithms
78
  invoked with an execution policy object of type
79
- `execution::parallel_policy` are permitted to execute in either the
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  invoking thread of execution or in a thread of execution implicitly
81
  created by the library to support parallel algorithm execution. If the
82
- threads of execution created by `thread` ([[thread.thread.class]])
83
- provide concurrent forward progress guarantees ([[intro.progress]]),
84
- then a thread of execution implicitly created by the library will
85
- provide parallel forward progress guarantees; otherwise, the provided
86
- forward progress guarantee is *implementation-defined*. Any such
87
- invocations executing in the same thread of execution are
88
- indeterminately sequenced with respect to each other.
 
89
 
90
- [*Note 3*: It is the caller’s responsibility to ensure that the
91
  invocation does not introduce data races or deadlocks. — *end note*]
92
 
93
  [*Example 1*:
94
 
95
  ``` cpp
@@ -109,13 +170,13 @@ to the container `v`.
109
 
110
  ``` cpp
111
  std::atomic<int> x{0};
112
  int a[] = {1,2};
113
  std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int) {
114
- x.fetch_add(1, std::memory_order_relaxed);
115
  // spin wait for another iteration to change the value of x
116
- while (x.load(std::memory_order_relaxed) == 1) { } // incorrect: assumes execution order
117
  });
118
  ```
119
 
120
  The above example depends on the order of execution of the iterations,
121
  and will not terminate if both iterations are executed sequentially on
@@ -139,78 +200,51 @@ The above example synchronizes access to object `x` ensuring that it is
139
  incremented correctly.
140
 
141
  — *end example*]
142
 
143
  The invocations of element access functions in parallel algorithms
144
- invoked with an execution policy of type
145
  `execution::parallel_unsequenced_policy` are permitted to execute in an
146
  unordered fashion in unspecified threads of execution, and unsequenced
147
  with respect to one another within each thread of execution. These
148
  threads of execution are either the invoking thread of execution or
149
  threads of execution implicitly created by the library; the latter will
150
  provide weakly parallel forward progress guarantees.
151
 
152
- [*Note 4*: This means that multiple function object invocations may be
153
  interleaved on a single thread of execution, which overrides the usual
154
  guarantee from [[intro.execution]] that function executions do not
155
- interleave with one another. — *end note*]
156
 
157
- Since `execution::parallel_unsequenced_policy` allows the execution of
158
- element access functions to be interleaved on a single thread of
159
- execution, blocking synchronization, including the use of mutexes, risks
160
- deadlock. Thus, the synchronization with
161
- `execution::parallel_unsequenced_policy` is restricted as follows: A
162
- standard library function is *vectorization-unsafe* if it is specified
163
- to synchronize with another function invocation, or another function
164
- invocation is specified to synchronize with it, and if it is not a
165
- memory allocation or deallocation function. Vectorization-unsafe
166
- standard library functions may not be invoked by user code called from
167
- `execution::parallel_unsequenced_policy` algorithms.
168
 
169
- [*Note 5*: Implementations must ensure that internal synchronization
170
- inside standard library functions does not prevent forward progress when
171
- those functions are executed by threads of execution with weakly
172
- parallel forward progress guarantees. — *end note*]
173
 
174
- [*Example 4*:
175
-
176
- ``` cpp
177
- int x = 0;
178
- std::mutex m;
179
- int a[] = {1,2};
180
- std::for_each(std::execution::par_unseq, std::begin(a), std::end(a), [&](int) {
181
- std::lock_guard<mutex> guard(m); // incorrect: lock_guard constructor calls m.lock()
182
- ++x;
183
- });
184
- ```
185
-
186
- The above program may result in two consecutive calls to `m.lock()` on
187
- the same thread of execution (which may deadlock), because the
188
- applications of the function object are not guaranteed to run on
189
- different threads of execution.
190
-
191
- — *end example*]
192
-
193
- [*Note 6*: The semantics of the `execution::parallel_policy` or the
194
- `execution::parallel_unsequenced_policy` invocation allow the
195
- implementation to fall back to sequential execution if the system cannot
196
- parallelize an algorithm invocation due to lack of
197
- resources. — *end note*]
198
 
199
  If an invocation of a parallel algorithm uses threads of execution
200
  implicitly created by the library, then the invoking thread of execution
201
  will either
202
 
203
- - temporarily block with forward progress guarantee delegation (
204
- [[intro.progress]]) on the completion of these library-managed threads
205
  of execution, or
206
  - eventually execute an element access function;
207
 
208
  the thread of execution will continue to do so until the algorithm is
209
  finished.
210
 
211
- [*Note 7*: In blocking with forward progress guarantee delegation in
212
  this context, a thread of execution created by the library is considered
213
  to have finished execution as soon as it has finished the execution of
214
  the particular element access function that the invoking thread of
215
  execution logically depends on. — *end note*]
216
 
@@ -248,7 +282,7 @@ says “at most *expr*” or “exactly *expr*” and does not specify the
248
  number of assignments or swaps, and *expr* is not already expressed with
249
  𝑂() notation, the complexity of the algorithm shall be
250
  𝑂(\placeholder{expr}).
251
 
252
  Parallel algorithms shall not participate in overload resolution unless
253
- `is_execution_policy_v<decay_t<ExecutionPolicy>>` is `true`.
254
 
 
1
  ## Parallel algorithms <a id="algorithms.parallel">[[algorithms.parallel]]</a>
2
 
3
+ ### Preamble <a id="algorithms.parallel.defns">[[algorithms.parallel.defns]]</a>
 
4
 
5
+ Subclause [[algorithms.parallel]] describes components that C++ programs
6
+ may use to perform operations on containers and other sequences in
7
+ parallel.
8
 
9
+ A *parallel algorithm* is a function template listed in this document
10
+ with a template parameter named `ExecutionPolicy`.
 
11
 
12
  Parallel algorithms access objects indirectly accessible via their
13
  arguments by invoking the following functions:
14
 
15
  - All operations of the categories of the iterators that the algorithm
 
17
  - Operations on those sequence elements that are required by its
18
  specification.
19
  - User-provided function objects to be applied during the execution of
20
  the algorithm, if required by the specification.
21
  - Operations on those function objects required by the specification.
22
+ \[*Note 1*: See  [[algorithms.requirements]]. — *end note*]
23
 
24
  These functions are herein called *element access functions*.
25
 
26
  [*Example 1*:
27
 
 
34
  preconditions specified in [[sort]]).
35
  - The user-provided `Compare` function object.
36
 
37
  — *end example*]
38
 
39
+ A standard library function is *vectorization-unsafe* if it is specified
40
+ to synchronize with another function invocation, or another function
41
+ invocation is specified to synchronize with it, and if it is not a
42
+ memory allocation or deallocation function.
43
+
44
+ [*Note 2*: Implementations must ensure that internal synchronization
45
+ inside standard library functions does not prevent forward progress when
46
+ those functions are executed by threads of execution with weakly
47
+ parallel forward progress guarantees. — *end note*]
48
+
49
+ [*Example 2*:
50
+
51
+ ``` cpp
52
+ int x = 0;
53
+ std::mutex m;
54
+ void f() {
55
+ int a[] = {1,2};
56
+ std::for_each(std::execution::par_unseq, std::begin(a), std::end(a), [&](int) {
57
+ std::lock_guard<mutex> guard(m); // incorrect: lock_guard constructor calls m.lock()
58
+ ++x;
59
+ });
60
+ }
61
+ ```
62
+
63
+ The above program may result in two consecutive calls to `m.lock()` on
64
+ the same thread of execution (which may deadlock), because the
65
+ applications of the function object are not guaranteed to run on
66
+ different threads of execution.
67
+
68
+ — *end example*]
69
+
70
  ### Requirements on user-provided function objects <a id="algorithms.parallel.user">[[algorithms.parallel.user]]</a>
71
 
72
  Unless otherwise specified, function objects passed into parallel
73
  algorithms as objects of type `Predicate`, `BinaryPredicate`, `Compare`,
74
  `UnaryOperation`, `BinaryOperation`, `BinaryOperation1`,
75
  `BinaryOperation2`, and the operators used by the analogous overloads to
76
  these parallel algorithms that could be formed by the invocation with
77
  the specified default predicate or operation (where applicable) shall
78
  not directly or indirectly modify objects via their arguments, nor shall
79
+ they rely on the identity of the provided objects.
80
 
81
  ### Effect of execution policies on algorithm execution <a id="algorithms.parallel.exec">[[algorithms.parallel.exec]]</a>
82
 
83
+ Parallel algorithms have template parameters named `ExecutionPolicy`
84
+ [[execpol]] which describe the manner in which the execution of these
85
  algorithms may be parallelized and the manner in which they apply the
86
  element access functions.
87
 
88
+ If an object is modified by an element access function, the algorithm
89
+ will perform no other unsynchronized accesses to that object. The
90
+ modifying element access functions are those which are specified as
91
+ modifying the object.
92
+
93
+ [*Note 1*: For example, `swap`, `++`, `--`, `@=`, and assignments
94
+ modify the object. For the assignment and `@=` operators, only the left
95
+ argument is modified. — *end note*]
96
+
97
  Unless otherwise stated, implementations may make arbitrary copies of
98
  elements (with type `T`) from sequences where
99
  `is_trivially_copy_constructible_v<T>` and
100
  `is_trivially_destructible_v<T>` are `true`.
101
 
102
+ [*Note 2*: This implies that user-supplied function objects should not
103
  rely on object identity of arguments for such input sequences. Users for
104
  whom the object identity of the arguments to these function objects is
105
  important should consider using a wrapping iterator that returns a
106
+ non-copied implementation object such as `reference_wrapper<T>`
107
+ [[refwrap]] or some equivalent solution. — *end note*]
108
 
109
  The invocations of element access functions in parallel algorithms
110
  invoked with an execution policy object of type
111
  `execution::sequenced_policy` all occur in the calling thread of
112
  execution.
113
 
114
+ [*Note 3*: The invocations are not interleaved; see 
115
  [[intro.execution]]. — *end note*]
116
 
117
  The invocations of element access functions in parallel algorithms
118
  invoked with an execution policy object of type
119
+ `execution::unsequenced_policy` are permitted to execute in an unordered
120
+ fashion in the calling thread of execution, unsequenced with respect to
121
+ one another in the calling thread of execution.
122
+
123
+ [*Note 4*: This means that multiple function object invocations may be
124
+ interleaved on a single thread of execution, which overrides the usual
125
+ guarantee from [[intro.execution]] that function executions do not
126
+ overlap with one another. — *end note*]
127
+
128
+ The behavior of a program is undefined if it invokes a
129
+ vectorization-unsafe standard library function from user code called
130
+ from a `execution::unsequenced_policy` algorithm.
131
+
132
+ [*Note 5*: Because `execution::unsequenced_policy` allows the execution
133
+ of element access functions to be interleaved on a single thread of
134
+ execution, blocking synchronization, including the use of mutexes, risks
135
+ deadlock. — *end note*]
136
+
137
+ The invocations of element access functions in parallel algorithms
138
+ invoked with an execution policy object of type
139
+ `execution::parallel_policy` are permitted to execute either in the
140
  invoking thread of execution or in a thread of execution implicitly
141
  created by the library to support parallel algorithm execution. If the
142
+ threads of execution created by `thread` [[thread.thread.class]] or
143
+ `jthread` [[thread.jthread.class]] provide concurrent forward progress
144
+ guarantees [[intro.progress]], then a thread of execution implicitly
145
+ created by the library will provide parallel forward progress
146
+ guarantees; otherwise, the provided forward progress guarantee is
147
+ *implementation-defined*. Any such invocations executing in the same
148
+ thread of execution are indeterminately sequenced with respect to each
149
+ other.
150
 
151
+ [*Note 6*: It is the caller’s responsibility to ensure that the
152
  invocation does not introduce data races or deadlocks. — *end note*]
153
 
154
  [*Example 1*:
155
 
156
  ``` cpp
 
170
 
171
  ``` cpp
172
  std::atomic<int> x{0};
173
  int a[] = {1,2};
174
  std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int) {
175
+ x.fetch_add(1, std::memory_order::relaxed);
176
  // spin wait for another iteration to change the value of x
177
+ while (x.load(std::memory_order::relaxed) == 1) { } // incorrect: assumes execution order
178
  });
179
  ```
180
 
181
  The above example depends on the order of execution of the iterations,
182
  and will not terminate if both iterations are executed sequentially on
 
200
  incremented correctly.
201
 
202
  — *end example*]
203
 
204
  The invocations of element access functions in parallel algorithms
205
+ invoked with an execution policy object of type
206
  `execution::parallel_unsequenced_policy` are permitted to execute in an
207
  unordered fashion in unspecified threads of execution, and unsequenced
208
  with respect to one another within each thread of execution. These
209
  threads of execution are either the invoking thread of execution or
210
  threads of execution implicitly created by the library; the latter will
211
  provide weakly parallel forward progress guarantees.
212
 
213
+ [*Note 7*: This means that multiple function object invocations may be
214
  interleaved on a single thread of execution, which overrides the usual
215
  guarantee from [[intro.execution]] that function executions do not
216
+ overlap with one another. — *end note*]
217
 
218
+ The behavior of a program is undefined if it invokes a
219
+ vectorization-unsafe standard library function from user code called
220
+ from a `execution::parallel_unsequenced_policy` algorithm.
 
 
 
 
 
 
 
 
221
 
222
+ [*Note 8*: Because `execution::parallel_unsequenced_policy` allows the
223
+ execution of element access functions to be interleaved on a single
224
+ thread of execution, blocking synchronization, including the use of
225
+ mutexes, risks deadlock. — *end note*]
226
 
227
+ [*Note 9*: The semantics of invocation with
228
+ `execution::unsequenced_policy`, `execution::parallel_policy`, or
229
+ `execution::parallel_unsequenced_policy` allow the implementation to
230
+ fall back to sequential execution if the system cannot parallelize an
231
+ algorithm invocation, e.g., due to lack of resources. — *end note*]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
232
 
233
  If an invocation of a parallel algorithm uses threads of execution
234
  implicitly created by the library, then the invoking thread of execution
235
  will either
236
 
237
+ - temporarily block with forward progress guarantee delegation
238
+ [[intro.progress]] on the completion of these library-managed threads
239
  of execution, or
240
  - eventually execute an element access function;
241
 
242
  the thread of execution will continue to do so until the algorithm is
243
  finished.
244
 
245
+ [*Note 10*: In blocking with forward progress guarantee delegation in
246
  this context, a thread of execution created by the library is considered
247
  to have finished execution as soon as it has finished the execution of
248
  the particular element access function that the invoking thread of
249
  execution logically depends on. — *end note*]
250
 
 
282
  number of assignments or swaps, and *expr* is not already expressed with
283
  𝑂() notation, the complexity of the algorithm shall be
284
  𝑂(\placeholder{expr}).
285
 
286
  Parallel algorithms shall not participate in overload resolution unless
287
+ `is_execution_policy_v<remove_cvref_t<ExecutionPolicy>>` is `true`.
288