Created by: reyoung
I just add a dependency engine to parse the dependencies of operators. There are still a lot of jobs need to be done.
- ~Complete Broadcast parameters.~
- ~Use thread pool to invoke operator parallelly.~
- ~Complete NCCL AllReduce OpHandle in this implementation.~
- Add fetch methods.
I just use VarHandle
and OpHandle
to parse Program
as a SSA form graph. A variable is assigned by only one OpHandle. When all inputs of OpHandle
is ready, the OpHandle
can be run.
The speed of ResNeXt152 is
Number of GPUs | 1 | 2 | 3 | 4 |
---|---|---|---|---|
Image/Sec | 18.639 | 27.8863 | 39.3787 | 52.9688 |
Speed Up | N/A | 1.4961264 | 2.11270454 | 2.84182628 |