- 21 10月, 2021 5 次提交
-
-
由 niuliling123 提交于
* Update the implement of reduceAnyKernel according to kernel primitive api * Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1
-
由 seemingwang 提交于
-
由 zhaocaibei123 提交于
* add ctr table depends * code style * fix * fix * fix naming * rename * rename
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * Revert "Add EventsWaiter" This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2. * adjust multithread using, fix flame graph * update
-
由 Aurelius84 提交于
* Add kQueueSync.synchronize_run_ logic * Support No DataTransform From GetKernelTypeForVar
-
- 20 10月, 2021 13 次提交
-
-
由 danleifeng 提交于
* split into PreBuildTask and BuildPull; slove endpass bug;test=develop * change buildcpu into prebuild and buildcpu into build;test=develop
-
由 李季 提交于
* fix global gather and global scatter operators
-
由 ronnywang 提交于
-
由 Wilber 提交于
-
由 Steffy-zxf 提交于
Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent. * support the text string as an input Tensor * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization. * It first applies basic tokenization, followed by wordpiece tokenization.
-
由 wuhuachaocoding 提交于
-
由 Zeng Jinle 提交于
-
由 Wilber 提交于
-
由 Wilber 提交于
-
由 zmx 提交于
* bug fix for DeserializeSelectedRows. test=develop * fix bug for SerializeSelectedRows. test=develop * update. test=develop
-
由 Huihuang Zheng 提交于
Add CINN compile option in CMake. Now you can use CINN in Paddle by `-DWITH_CINN=ON` when `cmake` To test it, you can run `make cinn_lib_test -j` and `ctest -R cinn_lib_test`. Note: 1. You should set ``` export runtime_include_dir=${CINN_SOURCE_DIR}/cinn/runtime/cuda ``` When run test, the `${CINN_SOURCE_DIR}` should be set based on your CINN directory. 2. CINN is under developing now, you may have to change `CINN_GIT_TAG` to the git commit you need.
-
由 wenbin 提交于
* fix * remove const
-
由 Aurelius84 提交于
-
- 19 10月, 2021 13 次提交
-
-
由 Weilong Wu 提交于
* Support elementwise_add triple grad Kernel * Change code-format to follow CI std
-
由 zhulei 提交于
* [NPU] Add iou_similarity op * [NPU] Add iou_similarity op * [NPU] Add iou_similarity op
-
由 Qi Li 提交于
* [NPU] update inference cmake, test=develop * address review comments, test=develop * fix compile error when WITH_ASCEND_CXX11 ON, test=develop
-
由 danleifeng 提交于
-
由 Wilber 提交于
* update * fix ut error * update ut
-
由 jiangcheng 提交于
* add feed op and new var for the generated subgraph * perfect the test script of build_cinn_pass * remove useless clear and perfect some annotation
-
由 wangxinxin08 提交于
* add nearest_interp_v2 trt plugin
-
由 WangXi 提交于
-
由 littletomatodonkey 提交于
* fix replicate pad when input size is 0 * add unit test
-
由 Yulong Ao 提交于
* Add QR decomposition op * Change codes to adapt to new svd_helper * Update linalg.py Restore the deleted comma * Restore the deleted line * Update linalg.py * Update linalg.py * Improve the qr code by reviews * Update QR based on CI results * Update qr doc, test=document_fix * Change unsafe and ill-formed codes
-
由 Xiaoxu Chen 提交于
-
由 zmx 提交于
-
由 Zeng Jinle 提交于
* add pow2_warmup op * remove contrib __all__ * add AttrT * rename * follow comments * fix duplicate PADDLE_RESTRICT
-
- 18 10月, 2021 8 次提交
-
-
由 jakpiase 提交于
* added softplus * refactored softplus op * deleted unnecessary file * added missing file * added formatting * disabled tests if GPU is used * added reviewer suggestion * unified softplus kernel
-
由 xiaoxiaohehe001 提交于
* add_quant_axis * add_quant_axis * --amend * Update quant_conv2d_dequant_fuse_pass.cc
-
由 Qi Li 提交于
-
由 Qi Li 提交于
-
由 Siming Dai 提交于
* fix async_read bug * change index place to cpu * add tensor size judge * add async_read & async_write test * fix bug in async_write * fix mac py3 ci * fix bug for cpu version paddle * fix windows ci bug * change input argument error type * change const_cast to mutable_data * add async_write out-of-bound check and consumate error hint * fix a small bug for dst_tensor * add docs and refine codes * refine docs * notest,test=windows_ci * fix windows ci * fix require * fix code-block * add core.is_compiled_with_cuda()
-
由 Wangzheee 提交于
-
由 taixiurong 提交于
[XPU AMP] 1. xpu support gradient acc 2. xpu support create tensor in dygraph 3. xpu support update weight params in amp (#36439)
-
由 JingZhuangzhuang 提交于
-
- 17 10月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
-