- 13 5月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Optimize the elementwise op with CUDA kernels. test=develop * Support setting of attr in op config file. test=develop * Add the support the setting dtype and initializer in config. test=develop * Save workspace. * Add initializer "zeros". test=develop * Fix compiling error. * Support the use of existed file to initailize tensor in op_tester. * Use eigen to optimize the elementwise_add/mul for the case that x and y have the same dims. test=develop
-
- 07 3月, 2019 2 次提交
- 26 2月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
Optimize the CUDA implementation of sequence_expand op by reduce the times of copying lod data from CPU to GPU. (#15493) * Optimize the CUDA implementation of sequence_expand op by reduce the times of copying lod data from CPU to GPU. test=develop * Refine the op benchmark to support setting lod in config. test=develop
-
- 22 2月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Initialize the benchmark tester for operator. test=develop * Rearrange the codes. test=develop
-
- 20 12月, 2018 1 次提交
-
-
由 xiaoli.liu@intel.com 提交于
test=develop
-
- 16 11月, 2018 1 次提交
-
-
由 Wu Yi 提交于
* wip simplify operator framework * wip * wip * done test=develop * clean test=develop * fix test=develop * fix deps test=develop * fix cpu build test=develop * fix tensorrt build test=develop * fix tests test=develop * fix test=develop * fix cpu build test=develop
-
- 27 9月, 2018 1 次提交
-
-
由 Jacek Czaja 提交于
- Added draft of new operator - Added fused embedding fc lstm files - First time embedding_fc_lstm_fuse_pass was invoked in test_text_classification - Added Embedding pattern - Not crashing - Enabled draft of embedding_fc_lstm pass (does it job) - First working (Seqcompute only) version - Removed diagnostic comment - First enabling of BatchCompute - Disabling pass for embedding with is_sparse and is_distributed - Cosmetics - Style - Style
-
- 22 8月, 2018 2 次提交
-
-
由 tensor-tang 提交于
-
由 tensor-tang 提交于
-
- 15 8月, 2018 2 次提交
-
-
由 tensor-tang 提交于
-
由 tensor-tang 提交于
-
- 08 5月, 2018 1 次提交
-
-
由 Yu Yang 提交于
Do not use ctor * Reduce line of codes. * We can use virtual function for Maker now. * The implementation does not care what maker holds, it is easier to refactor later.
-
- 03 4月, 2018 2 次提交
-
-
由 mozga-intel 提交于
-
由 mozga-intel 提交于
-