<spanid="design-doc-execute-the-program-with-multi-cpu"></span><h1>Design Doc: Execute the Program with Multi CPU<aclass="headerlink"href="#design-doc-execute-the-program-with-multi-cpu"title="Permalink to this headline">¶</a></h1>
<divclass="section"id="abstract">
<spanid="abstract"></span><h2>Abstract<aclass="headerlink"href="#abstract"title="Permalink to this headline">¶</a></h2>
<p>This Design Doc propose an approach to make the user-defined Op graph
running with multi-CPU, we will use an auto transpiler to convert the user-defined
Op graph to a multi-CPU Op graph, and run <codeclass="docutils literal"><spanclass="pre">ParallelDo</span></code> Op to run the graph.</p>
</div>
<divclass="section"id="transpiler">
<spanid="transpiler"></span><h2>Transpiler<aclass="headerlink"href="#transpiler"title="Permalink to this headline">¶</a></h2>
<spanid="implement"></span><h2>Implement<aclass="headerlink"href="#implement"title="Permalink to this headline">¶</a></h2>
<ul>
<li><pclass="first"><codeclass="docutils literal"><spanclass="pre">Multi-CPU</span><spanclass="pre">Transpiler</span></code> will convert the graph to a multi-CPU graph
which would be executed with multi-threads.</p>
</li>
<li><pclass="first"><codeclass="docutils literal"><spanclass="pre">BlockingCounter</span></code> will <codeclass="docutils literal"><spanclass="pre">Init/Decrement</span></code> an atomic counter, and Blocking <codeclass="docutils literal"><spanclass="pre">Wait</span></code>
for the atomic counter become <codeclass="docutils literal"><spanclass="pre">0</span></code>:</p>
<li>Initialize a thread pool which is a Singleton.</li>
<li>Use a block id as the input, and create run the specify Block on independent scope
with multi-threads.</li>
<li>Initialize a <codeclass="docutils literal"><spanclass="pre">BlockingCounter</span></code> instance and wait until all threads are done.</li>
</ul>
</li>
<li><pclass="first"><codeclass="docutils literal"><spanclass="pre">Split</span></code> Operator will split the Input Tensor into a TensorArray.</p>
</li>
<li><pclass="first"><codeclass="docutils literal"><spanclass="pre">Merge</span></code> merge all the gradients which calculated in different threads
with <codeclass="docutils literal"><spanclass="pre">mean/sum/max/min...</span></code> method, and then run the Optimizer Op to optimize <codeclass="docutils literal"><spanclass="pre">W</span></code>.</p>
</li>
</ul>
</div>
<divclass="section"id="todo">
<spanid="todo"></span><h2>TODO<aclass="headerlink"href="#todo"title="Permalink to this headline">¶</a></h2>
<ulclass="simple">
<li>Improve the optimizer stage with multi-threads, since we could
assign the parameters to the different threads and execute
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.
<spanid="design-doc-execute-the-program-with-multi-cpu"></span><h1>Design Doc: Execute the Program with Multi CPU<aclass="headerlink"href="#design-doc-execute-the-program-with-multi-cpu"title="永久链接至标题">¶</a></h1>
<li><pclass="first"><codeclass="docutils literal"><spanclass="pre">Multi-CPU</span><spanclass="pre">Transpiler</span></code> will convert the graph to a multi-CPU graph
which would be executed with multi-threads.</p>
</li>
<li><pclass="first"><codeclass="docutils literal"><spanclass="pre">BlockingCounter</span></code> will <codeclass="docutils literal"><spanclass="pre">Init/Decrement</span></code> an atomic counter, and Blocking <codeclass="docutils literal"><spanclass="pre">Wait</span></code>
for the atomic counter become <codeclass="docutils literal"><spanclass="pre">0</span></code>:</p>
<li>Initialize a thread pool which is a Singleton.</li>
<li>Use a block id as the input, and create run the specify Block on independent scope
with multi-threads.</li>
<li>Initialize a <codeclass="docutils literal"><spanclass="pre">BlockingCounter</span></code> instance and wait until all threads are done.</li>
</ul>
</li>
<li><pclass="first"><codeclass="docutils literal"><spanclass="pre">Split</span></code> Operator will split the Input Tensor into a TensorArray.</p>
</li>
<li><pclass="first"><codeclass="docutils literal"><spanclass="pre">Merge</span></code> merge all the gradients which calculated in different threads
with <codeclass="docutils literal"><spanclass="pre">mean/sum/max/min...</span></code> method, and then run the Optimizer Op to optimize <codeclass="docutils literal"><spanclass="pre">W</span></code>.</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.