From Jason Turner

[algorithms.parallel]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpayh171dx/{from.md → to.md} +254 -0
tmp/tmpayh171dx/{from.md → to.md} RENAMED
@@ -0,0 +1,254 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Parallel algorithms <a id="algorithms.parallel">[[algorithms.parallel]]</a>
2
+
3
+ This section describes components that C++programs may use to perform
4
+ operations on containers and other sequences in parallel.
5
+
6
+ ### Terms and definitions <a id="algorithms.parallel.defns">[[algorithms.parallel.defns]]</a>
7
+
8
+ A *parallel algorithm* is a function template listed in this
9
+ International Standard with a template parameter named
10
+ `ExecutionPolicy`.
11
+
12
+ Parallel algorithms access objects indirectly accessible via their
13
+ arguments by invoking the following functions:
14
+
15
+ - All operations of the categories of the iterators that the algorithm
16
+ is instantiated with.
17
+ - Operations on those sequence elements that are required by its
18
+ specification.
19
+ - User-provided function objects to be applied during the execution of
20
+ the algorithm, if required by the specification.
21
+ - Operations on those function objects required by the specification.
22
+ \[*Note 1*: See  [[algorithms.general]]. — *end note*]
23
+
24
+ These functions are herein called *element access functions*.
25
+
26
+ [*Example 1*:
27
+
28
+ The `sort` function may invoke the following element access functions:
29
+
30
+ - Operations of the random-access iterator of the actual template
31
+ argument (as per [[random.access.iterators]]), as implied by the name
32
+ of the template parameter `RandomAccessIterator`.
33
+ - The `swap` function on the elements of the sequence (as per the
34
+ preconditions specified in [[sort]]).
35
+ - The user-provided `Compare` function object.
36
+
37
+ — *end example*]
38
+
39
+ ### Requirements on user-provided function objects <a id="algorithms.parallel.user">[[algorithms.parallel.user]]</a>
40
+
41
+ Unless otherwise specified, function objects passed into parallel
42
+ algorithms as objects of type `Predicate`, `BinaryPredicate`, `Compare`,
43
+ `UnaryOperation`, `BinaryOperation`, `BinaryOperation1`,
44
+ `BinaryOperation2`, and the operators used by the analogous overloads to
45
+ these parallel algorithms that could be formed by the invocation with
46
+ the specified default predicate or operation (where applicable) shall
47
+ not directly or indirectly modify objects via their arguments, nor shall
48
+ they rely on the identity of the provided objects..
49
+
50
+ ### Effect of execution policies on algorithm execution <a id="algorithms.parallel.exec">[[algorithms.parallel.exec]]</a>
51
+
52
+ Parallel algorithms have template parameters named `ExecutionPolicy` (
53
+ [[execpol]]) which describe the manner in which the execution of these
54
+ algorithms may be parallelized and the manner in which they apply the
55
+ element access functions.
56
+
57
+ Unless otherwise stated, implementations may make arbitrary copies of
58
+ elements (with type `T`) from sequences where
59
+ `is_trivially_copy_constructible_v<T>` and
60
+ `is_trivially_destructible_v<T>` are `true`.
61
+
62
+ [*Note 1*: This implies that user-supplied function objects should not
63
+ rely on object identity of arguments for such input sequences. Users for
64
+ whom the object identity of the arguments to these function objects is
65
+ important should consider using a wrapping iterator that returns a
66
+ non-copied implementation object such as `reference_wrapper<T>` (
67
+ [[refwrap]]) or some equivalent solution. — *end note*]
68
+
69
+ The invocations of element access functions in parallel algorithms
70
+ invoked with an execution policy object of type
71
+ `execution::sequenced_policy` all occur in the calling thread of
72
+ execution.
73
+
74
+ [*Note 2*: The invocations are not interleaved; see 
75
+ [[intro.execution]]. — *end note*]
76
+
77
+ The invocations of element access functions in parallel algorithms
78
+ invoked with an execution policy object of type
79
+ `execution::parallel_policy` are permitted to execute in either the
80
+ invoking thread of execution or in a thread of execution implicitly
81
+ created by the library to support parallel algorithm execution. If the
82
+ threads of execution created by `thread` ([[thread.thread.class]])
83
+ provide concurrent forward progress guarantees ([[intro.progress]]),
84
+ then a thread of execution implicitly created by the library will
85
+ provide parallel forward progress guarantees; otherwise, the provided
86
+ forward progress guarantee is *implementation-defined*. Any such
87
+ invocations executing in the same thread of execution are
88
+ indeterminately sequenced with respect to each other.
89
+
90
+ [*Note 3*: It is the caller’s responsibility to ensure that the
91
+ invocation does not introduce data races or deadlocks. — *end note*]
92
+
93
+ [*Example 1*:
94
+
95
+ ``` cpp
96
+ int a[] = {0,1};
97
+ std::vector<int> v;
98
+ std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int i) {
99
+ v.push_back(i*2+1); // incorrect: data race
100
+ });
101
+ ```
102
+
103
+ The program above has a data race because of the unsynchronized access
104
+ to the container `v`.
105
+
106
+ — *end example*]
107
+
108
+ [*Example 2*:
109
+
110
+ ``` cpp
111
+ std::atomic<int> x{0};
112
+ int a[] = {1,2};
113
+ std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int) {
114
+ x.fetch_add(1, std::memory_order_relaxed);
115
+ // spin wait for another iteration to change the value of x
116
+ while (x.load(std::memory_order_relaxed) == 1) { } // incorrect: assumes execution order
117
+ });
118
+ ```
119
+
120
+ The above example depends on the order of execution of the iterations,
121
+ and will not terminate if both iterations are executed sequentially on
122
+ the same thread of execution.
123
+
124
+ — *end example*]
125
+
126
+ [*Example 3*:
127
+
128
+ ``` cpp
129
+ int x = 0;
130
+ std::mutex m;
131
+ int a[] = {1,2};
132
+ std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int) {
133
+ std::lock_guard<mutex> guard(m);
134
+ ++x;
135
+ });
136
+ ```
137
+
138
+ The above example synchronizes access to object `x` ensuring that it is
139
+ incremented correctly.
140
+
141
+ — *end example*]
142
+
143
+ The invocations of element access functions in parallel algorithms
144
+ invoked with an execution policy of type
145
+ `execution::parallel_unsequenced_policy` are permitted to execute in an
146
+ unordered fashion in unspecified threads of execution, and unsequenced
147
+ with respect to one another within each thread of execution. These
148
+ threads of execution are either the invoking thread of execution or
149
+ threads of execution implicitly created by the library; the latter will
150
+ provide weakly parallel forward progress guarantees.
151
+
152
+ [*Note 4*: This means that multiple function object invocations may be
153
+ interleaved on a single thread of execution, which overrides the usual
154
+ guarantee from [[intro.execution]] that function executions do not
155
+ interleave with one another. — *end note*]
156
+
157
+ Since `execution::parallel_unsequenced_policy` allows the execution of
158
+ element access functions to be interleaved on a single thread of
159
+ execution, blocking synchronization, including the use of mutexes, risks
160
+ deadlock. Thus, the synchronization with
161
+ `execution::parallel_unsequenced_policy` is restricted as follows: A
162
+ standard library function is *vectorization-unsafe* if it is specified
163
+ to synchronize with another function invocation, or another function
164
+ invocation is specified to synchronize with it, and if it is not a
165
+ memory allocation or deallocation function. Vectorization-unsafe
166
+ standard library functions may not be invoked by user code called from
167
+ `execution::parallel_unsequenced_policy` algorithms.
168
+
169
+ [*Note 5*: Implementations must ensure that internal synchronization
170
+ inside standard library functions does not prevent forward progress when
171
+ those functions are executed by threads of execution with weakly
172
+ parallel forward progress guarantees. — *end note*]
173
+
174
+ [*Example 4*:
175
+
176
+ ``` cpp
177
+ int x = 0;
178
+ std::mutex m;
179
+ int a[] = {1,2};
180
+ std::for_each(std::execution::par_unseq, std::begin(a), std::end(a), [&](int) {
181
+ std::lock_guard<mutex> guard(m); // incorrect: lock_guard constructor calls m.lock()
182
+ ++x;
183
+ });
184
+ ```
185
+
186
+ The above program may result in two consecutive calls to `m.lock()` on
187
+ the same thread of execution (which may deadlock), because the
188
+ applications of the function object are not guaranteed to run on
189
+ different threads of execution.
190
+
191
+ — *end example*]
192
+
193
+ [*Note 6*: The semantics of the `execution::parallel_policy` or the
194
+ `execution::parallel_unsequenced_policy` invocation allow the
195
+ implementation to fall back to sequential execution if the system cannot
196
+ parallelize an algorithm invocation due to lack of
197
+ resources. — *end note*]
198
+
199
+ If an invocation of a parallel algorithm uses threads of execution
200
+ implicitly created by the library, then the invoking thread of execution
201
+ will either
202
+
203
+ - temporarily block with forward progress guarantee delegation (
204
+ [[intro.progress]]) on the completion of these library-managed threads
205
+ of execution, or
206
+ - eventually execute an element access function;
207
+
208
+ the thread of execution will continue to do so until the algorithm is
209
+ finished.
210
+
211
+ [*Note 7*: In blocking with forward progress guarantee delegation in
212
+ this context, a thread of execution created by the library is considered
213
+ to have finished execution as soon as it has finished the execution of
214
+ the particular element access function that the invoking thread of
215
+ execution logically depends on. — *end note*]
216
+
217
+ The semantics of parallel algorithms invoked with an execution policy
218
+ object of *implementation-defined* type are *implementation-defined*.
219
+
220
+ ### Parallel algorithm exceptions <a id="algorithms.parallel.exceptions">[[algorithms.parallel.exceptions]]</a>
221
+
222
+ During the execution of a parallel algorithm, if temporary memory
223
+ resources are required for parallelization and none are available, the
224
+ algorithm throws a `bad_alloc` exception.
225
+
226
+ During the execution of a parallel algorithm, if the invocation of an
227
+ element access function exits via an uncaught exception, the behavior is
228
+ determined by the `ExecutionPolicy`.
229
+
230
+ ### `ExecutionPolicy` algorithm overloads <a id="algorithms.parallel.overloads">[[algorithms.parallel.overloads]]</a>
231
+
232
+ Parallel algorithms are algorithm overloads. Each parallel algorithm
233
+ overload has an additional template type parameter named
234
+ `ExecutionPolicy`, which is the first template parameter. Additionally,
235
+ each parallel algorithm overload has an additional function parameter of
236
+ type `ExecutionPolicy&&`, which is the first function parameter.
237
+
238
+ [*Note 1*: Not all algorithms have parallel algorithm
239
+ overloads. — *end note*]
240
+
241
+ Unless otherwise specified, the semantics of `ExecutionPolicy` algorithm
242
+ overloads are identical to their overloads without.
243
+
244
+ Unless otherwise specified, the complexity requirements of
245
+ `ExecutionPolicy` algorithm overloads are relaxed from the complexity
246
+ requirements of the overloads without as follows: when the guarantee
247
+ says “at most *expr*” or “exactly *expr*” and does not specify the
248
+ number of assignments or swaps, and *expr* is not already expressed with
249
+ 𝑂() notation, the complexity of the algorithm shall be
250
+ 𝑂(\placeholder{expr}).
251
+
252
+ Parallel algorithms shall not participate in overload resolution unless
253
+ `is_execution_policy_v<decay_t<ExecutionPolicy>>` is `true`.
254
+