Design Doc: Execute the Program with Multi CPU¶
Abstract¶
This Design Doc propose an approach to make the user-defined Op graph
running with multi-CPU, we will use an auto transpiler to convert the user-defined
Op graph to a multi-CPU Op graph, and run ParallelDo Op to run the graph.
Implement¶
- Multi-CPU Transpilerwill convert the graph to a multi-CPU graph which would be executed with multi-threads.
- BlockingCounterwill- Init/Decrementan atomic counter, and Blocking- Waitfor the atomic counter become- 0:- BlockingCounter bc(thread_count); for (int i = 0; i < thread_count; ++i) { thread_pool->Start([&bc] {bc.DecrementCount(); }) } bc.Wait(); 
- ParallelDoOperator- Initialize a thread pool which is a Singleton.
- Use a block id as the input, and create run the specify Block on independent scope with multi-threads.
- Initialize a BlockingCounterinstance and wait until all threads are done.
 
- SplitOperator will split the Input Tensor into a TensorArray.
- Mergemerge all the gradients which calculated in different threads with- mean/sum/max/min...method, and then run the Optimizer Op to optimize- W.
TODO¶
- Improve the optimizer stage with multi-threads, since we could assign the parameters to the different threads and execute optimizer with multi-threads.


