- 21 10月, 2021 12 次提交
-
-
由 zhulei 提交于
-
由 ronnywang 提交于
-
由 jakpiase 提交于
* added base changes for matmul_v2+trans+resh fuse pass * added full matmul_v2+transpose+reshape pass * removed a file added by mistake * added reviewers suggestions * Changed ops type in checking capatibility version * Deteled one statement
-
由 furnace 提交于
* add sync_batch_norm (support train, infer, and fp32, fp16, and NCHW, NHWC) * [NPU] Delete debug codes * [NPU] Remove FP16
-
由 Jack Zhou 提交于
* add viterbi decode cpu kernel * add viterbi decoder api in paddle.text * add a data buffer once to avoid create many small pieces of data buffer frequently * fix viterbi max_seq_length bug * fix seq_len=1 bug * fix device context * move split out of for loop * remove INVERSE_SUB * remove 2 GET_CAST_MASK * remove 1 loop * remove Functor * add to_static deploy code * use MAX_FUNC instead of ELE_MAX * add MaxFunctor * impl max_func * remove MaxFunctor * remove cast op * use REGISTER_OP_WITHOUT_GRADIENT * add viterbi cuda kernel * add FIX_BLOCKDIM_CASE macro * add MKL add, mul; add get data mask * add arange mkl impl * add CPU Argmax * add cpu gather * use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL * use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP * use SAME_DIMS_ELEMENT_BINARY_OP * add SimpleBroadcastBinaryOP * use int instead of int64_t to accelerate * optimize SimpleBroadcastBinaryOP * optimize SimpleBroadcastBinaryOP * optimize performance in both single thread and multithread situation * remove useless line * remove useless code * add CREATE_TENSOR_BUFFER macro * add INIT_REQUIRED_TENSOR macro * add comment * fix windows ci * add viterbi unittest * remove cuda add functor * remove cuda equal * remove a template function * fix windows ci * fix windows dtype * remove some template instance * remove useless header file * remove some blockdim * remove transpose impl * accelerate cpu performance on single thread situation * viterbi_decode->crf_decode * rename crf params name * add viterbi api test * remove useless import * add enable_static * use viterbi decoder * fix viterbi len=1 * fix viterbi unittest * remove useless comments * reconstruct viterbi decode * remove ADD,SUB,MUL structure * fix coverage * remove CREATE_TENSOR * add name args * crf.py->ops.py; with_start_stop_tag->include_start_end_tag * update crf_decode en docs * fix viterbi decode en docs * fix some review comments * add FIXED_BLOCK_DIM_CASE in cuda * push_back->emplace_back * crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag * paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode * fix viterbi_decode en docs
-
由 TTerror 提交于
* add some ops to train ssd on kunlun * update test_fill_any_like_op_xpu.py
-
由 xiongkun 提交于
-
由 niuliling123 提交于
* Update the implement of reduceAnyKernel according to kernel primitive api * Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1
-
由 seemingwang 提交于
-
由 zhaocaibei123 提交于
* add ctr table depends * code style * fix * fix * fix naming * rename * rename
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * Revert "Add EventsWaiter" This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2. * adjust multithread using, fix flame graph * update
-
由 Aurelius84 提交于
* Add kQueueSync.synchronize_run_ logic * Support No DataTransform From GetKernelTypeForVar
-
- 20 10月, 2021 13 次提交
-
-
由 danleifeng 提交于
* split into PreBuildTask and BuildPull; slove endpass bug;test=develop * change buildcpu into prebuild and buildcpu into build;test=develop
-
由 李季 提交于
* fix global gather and global scatter operators
-
由 ronnywang 提交于
-
由 Wilber 提交于
-
由 Steffy-zxf 提交于
Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent. * support the text string as an input Tensor * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization. * It first applies basic tokenization, followed by wordpiece tokenization.
-
由 wuhuachaocoding 提交于
-
由 Zeng Jinle 提交于
-
由 Wilber 提交于
-
由 Wilber 提交于
-
由 zmx 提交于
* bug fix for DeserializeSelectedRows. test=develop * fix bug for SerializeSelectedRows. test=develop * update. test=develop
-
由 Huihuang Zheng 提交于
Add CINN compile option in CMake. Now you can use CINN in Paddle by `-DWITH_CINN=ON` when `cmake` To test it, you can run `make cinn_lib_test -j` and `ctest -R cinn_lib_test`. Note: 1. You should set ``` export runtime_include_dir=${CINN_SOURCE_DIR}/cinn/runtime/cuda ``` When run test, the `${CINN_SOURCE_DIR}` should be set based on your CINN directory. 2. CINN is under developing now, you may have to change `CINN_GIT_TAG` to the git commit you need.
-
由 wenbin 提交于
* fix * remove const
-
由 Aurelius84 提交于
-
- 19 10月, 2021 13 次提交
-
-
由 Weilong Wu 提交于
* Support elementwise_add triple grad Kernel * Change code-format to follow CI std
-
由 zhulei 提交于
* [NPU] Add iou_similarity op * [NPU] Add iou_similarity op * [NPU] Add iou_similarity op
-
由 Qi Li 提交于
* [NPU] update inference cmake, test=develop * address review comments, test=develop * fix compile error when WITH_ASCEND_CXX11 ON, test=develop
-
由 danleifeng 提交于
-
由 Wilber 提交于
* update * fix ut error * update ut
-
由 jiangcheng 提交于
* add feed op and new var for the generated subgraph * perfect the test script of build_cinn_pass * remove useless clear and perfect some annotation
-
由 wangxinxin08 提交于
* add nearest_interp_v2 trt plugin
-
由 WangXi 提交于
-
由 littletomatodonkey 提交于
* fix replicate pad when input size is 0 * add unit test
-
由 Yulong Ao 提交于
* Add QR decomposition op * Change codes to adapt to new svd_helper * Update linalg.py Restore the deleted comma * Restore the deleted line * Update linalg.py * Update linalg.py * Improve the qr code by reviews * Update QR based on CI results * Update qr doc, test=document_fix * Change unsafe and ill-formed codes
-
由 Xiaoxu Chen 提交于
-
由 zmx 提交于
-
由 Zeng Jinle 提交于
* add pow2_warmup op * remove contrib __all__ * add AttrT * rename * follow comments * fix duplicate PADDLE_RESTRICT
-
- 18 10月, 2021 2 次提交
-
-
由 jakpiase 提交于
* added softplus * refactored softplus op * deleted unnecessary file * added missing file * added formatting * disabled tests if GPU is used * added reviewer suggestion * unified softplus kernel
-
由 xiaoxiaohehe001 提交于
* add_quant_axis * add_quant_axis * --amend * Update quant_conv2d_dequant_fuse_pass.cc
-