Design Doc: Execute the Program with Multi CPU¶
Abstract¶
This Design Doc propose an approach to make the user-defined Op graph
running with multi-CPU, we will use an auto transpiler to convert the user-defined
Op graph to a multi-CPU Op graph, and run ParallelDo Op to run the graph.
Implement¶
Multi-CPU Transpilerwill convert the graph to a multi-CPU graph which would be executed with multi-threads.BlockingCounterwillInit/Decrementan atomic counter, and BlockingWaitfor the atomic counter become0:BlockingCounter bc(thread_count); for (int i = 0; i < thread_count; ++i) { thread_pool->Start([&bc] {bc.DecrementCount(); }) } bc.Wait();
ParallelDoOperator- Initialize a thread pool which is a Singleton.
- Use a block id as the input, and create run the specify Block on independent scope with multi-threads.
- Initialize a
BlockingCounterinstance and wait until all threads are done.
SplitOperator will split the Input Tensor into a TensorArray.Mergemerge all the gradients which calculated in different threads withmean/sum/max/min...method, and then run the Optimizer Op to optimizeW.
TODO¶
- Improve the optimizer stage with multi-threads, since we could assign the parameters to the different threads and execute optimizer with multi-threads.


