- 19 3月, 2020 1 次提交
-
-
由 Sylwester Fraczek 提交于
-
- 18 3月, 2020 1 次提交
-
-
由 Zhang Ting 提交于
* remove unnecessary prepare data, test=develop * Op in while block will not skip PrepareData, test=develop
-
- 17 3月, 2020 2 次提交
-
-
由 Adam 提交于
-
由 yaoxuefeng 提交于
-
- 13 3月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* add fusion group test for backward and refine code
-
- 12 3月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* add support for expression type convert and add cast Op support in fusion group
-
- 11 3月, 2020 3 次提交
-
-
由 Wilber 提交于
* add skip_layernorm pass. test=develop
-
由 Adam 提交于
-
由 Zhaolong Xing 提交于
* 1. add embedding eltwise layernorm fuse 2. add embedding eltwise layernorm op 3. refine inplace_add_relu 4. refine fc_eltwise_layernorm test=develop * 1. refine fc test=develop * fix comments test=develop * fix comments test=develop
-
- 09 3月, 2020 2 次提交
-
-
由 Zeng Jinle 提交于
* refine grad maker, test=develop * refactor tracer stage 1, test=develop * merge develop to solve conflict third times, test=develop
-
由 liu zhengxi 提交于
* fix fc padding during fusion, test=develop * fix optim model inference after SaveOptimModel, test=develop
-
- 07 3月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* refine the profiler print test=develop
-
- 05 3月, 2020 1 次提交
-
-
由 hong 提交于
* reduce default attrs for dynamic graph, test=develop * add some explanations for explicit attr, test=develop * tweak explicit attr comments, test=develop
-
- 03 3月, 2020 2 次提交
-
-
由 Zhang Ting 提交于
-
由 Zhang Ting 提交于
* add fluid.device_guard to specify the device type for Op
-
- 02 3月, 2020 2 次提交
-
-
由 Zhen Wang 提交于
* update ScopeBufferedSSAGraphExecutor&AsyncSSAGraphExecutor&ThreadedSSAGraphExecutor&FastThreadedSSAGraphExecutor&ParallelSSAGraphExecutor&ParallelExecutor for fetching unmerged results. * add the unit test for fetch_unmerged. * update ut for multi-card and multi-cpu. * add the error message and the user suggestion in FetchOpHandle. test=develop
-
由 hutuxian 提交于
* user can call dataset.set_download_cmd to set its customized download cmd * add UT to cover this scenario
-
- 01 3月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* Add the codegen and auto fusion for sum Op in fusion group
-
- 28 2月, 2020 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 26 2月, 2020 1 次提交
-
-
由 Leo Chen 提交于
* support cond in clone, test=develop * refine code, test=develop * refine code, test=develop * follow comments, test=develop * refine code, test=develop
-
- 25 2月, 2020 1 次提交
-
-
由 hutuxian 提交于
* Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator. * Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly. * Remove CPU code in Pull/PushSparse and we will add it back when testing it fully. * Fix some known issues: such as copying persistable vars after one epoch running.
-
- 24 2月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* Add an interface of disabling FC padding * fix bert regression * polish fc padding interface * recover pass function * fix argument error * fix mkldnn error
-
- 23 2月, 2020 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 22 2月, 2020 1 次提交
-
-
由 tangwei12 提交于
* add sync communicator and implement
-
- 21 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
-
- 18 2月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* add python flag to control profile level test=develop
-
- 17 2月, 2020 1 次提交
-
-
由 123malin 提交于
-
- 15 2月, 2020 1 次提交
-
-
由 flame 提交于
-
- 14 2月, 2020 1 次提交
-
-
由 Wilber 提交于
当一个模型中有多个fc_lstm子图的时候,且其中fc共用了同一个persistable的bias,此时不应该将bias节点删除,只将非persistable的节点去除即可。
-
- 13 2月, 2020 2 次提交
-
-
由 Zhaolong Xing 提交于
* 1. optim multihead matmul: fuse three fc to multihtead matmul test=develop * fix conflict test=develop * fix comments test=develop
-
由 Yiqun Liu 提交于
test=develop
-
- 12 2月, 2020 1 次提交
-
-
由 tangwei12 提交于
* add thread barrier for the compiled program
-
- 11 2月, 2020 5 次提交
-
-
由 hutuxian 提交于
Refine PaddleBox Framework, Main functions: * Add MetricMsg util class, which can calculate metrics like AUC, bucket_error, COPC. * Replace FeedPass with new interface: BeginFeedPass & EndFeedPass * Refactor Pull/Push Sparse Function in box_wrapper. * Use CUDA Kernel to copy keys and copy feasign between tensor and boxps struct. * Cache copied keys in pull sparse in order to reuse it in push period.
-
由 yaoxuefeng 提交于
* update * update test=develop * update compile set test=develop * update compile set test=develop * update test=develop * update test=develop * update test=develop * update compile setting test=develop * update compile setting test=develop * update run demo test=develop * update test=develop * update test=develop * fix test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update format test=develop * update format test=develop * update style test=develop * update style test=develop * change style test=develop * change style test=develop * change style test=develop * add dataset unittest test=develop * update test=develop * update for record test=develop * udpate style for record test=develop * update for record test=develop * update for record test=develop * update for record test=develop * fix format test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop
-
由 zhaoyuchen2018 提交于
* Refine code, fix select tile error,test=develop * Refine element type and some comments, test=develop * Refine comments and gpu utils, test=develop * Remove some useless condition * Refine floor and ceil, test=develop * refine for loop. test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 Wilber 提交于
支持不依赖nccl进行编译。[1/2] 多卡下,如果没有打开WITH_NCCL开关编译,多卡不能通信,则只能选择一张卡使用。 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
由 guofei 提交于
This PR makes assign op support LoDTensorArray and enable the loop_vars in while_loop to support tuple or list.
-
- 07 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop
-
- 06 2月, 2020 1 次提交
-
-
由 joanna.wozna.intel 提交于
* Add dequant scale squash test=develop * Correct dequant-scale squash test test=develop
-
- 05 2月, 2020 1 次提交
-
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-