- 18 12月, 2021 1 次提交
-
-
由 王明冬 提交于
-
- 17 12月, 2021 19 次提交
-
-
由 Jiabin Yang 提交于
* support more eager tensor api * support multiple constructor for eager tensor * add place related code * polish code * specific randint with dtype of int64 * Support pure cpu test * refine test in pure cpu * refine test in pure cpu
-
由 Leo Chen 提交于
-
由 Chen Weihang 提交于
-
由 sneaxiy 提交于
* support multi precision update for LAMB * hide some api * fix ci uts * fix lamb output of dygraph * remove some changes to some PR * try to fix Py3 CI compile error * fix test_imperative_optimizer, add lars ut, add layer_norm ut * fix ut, fix format * fix ut * fix windows ci
-
由 feng_shuai 提交于
-
由 Sing_chan 提交于
-
由 chentianyu03 提交于
* modify sum mean args * add GetExpectedPtenKernelArgs for redcue_op * modify kernel args number * modify kernel args number
-
由 LiYuRio 提交于
-
由 Zhanlue Yang 提交于
* Rearranged Eager AutoCodeGen directory structure * Removed USE_OP in Eager AutoCodeGen * Enabled generation for Operators without Grad/Inputs/Outputs * Resolved operators without input * Fixed merge conflicts * Enabled Eager AutoCodeGen for 10+ more operators * Refactored Eager AutoCodeGen with more organized helper objects * Enabled Eager AutoCodeGen for operators with multiple OpBases * Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument * Handled Dispensable Inputs/Outputs in Eager AutoCodeGen * Adjusted function generation/call between Python-C API & Dygraph API * Synchronized auto-generated Python-C API with Dygraph Forward Functions * Generated CoreOpsInfos for potential use in append_op API * Fixed CI problem
-
由 kuizhiqing 提交于
-
由 Leo Chen 提交于
* Inspect the information inside a TRT engine. * Follow up the google code style. * Fix code error.
-
由 zlsh80826 提交于
From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block. There are two ways to resolve this problem: Reduce the threads per block launch configuration add __launch_bound__ to give information to nvcc compiler for reducing registers usage this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.
-
由 niuliling123 提交于
-
由 From00 提交于
* Get GPU BasePtr from CUDA allocation * Fix compile error for ROCm * Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc * Add alignment for BuddyAllocator * Set address alignment of BuddyAllocator to 32 bytes * Fix CI error * Remove code for naive_best_fit strategy
-
由 From00 提交于
-
由 Chen Weihang 提交于
-
由 Yuang Liu 提交于
-
由 limingshu 提交于
* fix_bugs_for_elementwise_branch_selection * fix merge_dims bugs * fix all influenced file
-
由 houj04 提交于
-
- 16 12月, 2021 20 次提交
-
-
由 Leo Chen 提交于
* fix cmake * not check execution time
-
由 chentianyu03 提交于
-
由 Tomasz Socha 提交于
* Faster implementation of CPU kernel for ROI_ALIGN Operator * Add missing variable to CUDA roi_align_op * Style * Fix boundaries * Rename variables for indexes calculation * Remove unnecessary emplace * Revert "Remove unnecessary emplace" This reverts commit c10e87f7fb812f1a672fde32f2690a97d47e2f5a. * Style
-
由 chentianyu03 提交于
-
由 Zhanlue Yang 提交于
* Rearranged Eager AutoCodeGen directory structure * Removed USE_OP in Eager AutoCodeGen * Enabled generation for Operators without Grad/Inputs/Outputs * Resolved operators without input * Fixed merge conflicts * Enabled Eager AutoCodeGen for 10+ more operators * Refactored Eager AutoCodeGen with more organized helper objects * Enabled Eager AutoCodeGen for operators with multiple OpBases * Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument * Handled Dispensable Inputs/Outputs in Eager AutoCodeGen * Enabled Eager AutoCodeGen for All Existing Operators & Possible Future Operators * Fixed CI issues * Fixed LD_LIBRARY_PATH for eager_code_generator
-
由 xiaoting 提交于
* add activation * update activation_op * add unitest for activation * fix acosh for init, test=develop
-
由 feng_shuai 提交于
* conv_transpose_eltwiseadd_bn_fuse_pass * change timeout * add TIMEOUT * add random num for group and dilation * change PassCompat
-
由 yeliang2258 提交于
* add test for conv_elementwise_add2_act_fuse_pass and conv_elementwise_add_act_fuse_pass * Add conv_eltwiseadd_bn_fuse_pass test and fix test_conv_elementwise_addX_act_fuse_pass * add tests for conv_act_mkldnn_fuse_pass * add test for conv_bias_mkldnn_fuse_pass * update code * add conv_act_mkldnn_fuse_pass for relu, relu6, swish, leaky_relu * update test * update * update bug * update * update pattern_detector * fix test_conv_eltwiseadd_bn_fuse_pass * add diff display notest;test=windows_ci_inference * fix * remove test_conv_act_mkldnn_fuse_pass.py * ifix
-
由 Chen Weihang 提交于
* unify device context entrance * move all_context include to header * polish cmake relay for device_context * fix npu compile failed * fix npu compile failed
-
由 YUNSHEN XIE 提交于
-
由 Chen Weihang 提交于
* add register_ctx_kernel and move scale kernel * polish details by reviewer comment * fix xpu compile failed * fix cmake error
-
由 Jiabin Yang 提交于
* support eager switch system * polish code
-
由 danleifeng 提交于
* trainer_device fix and checknan tool for psgpu;test=develop * disable show_one_table;test=develop
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * Revert "Add EventsWaiter" This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2. * add os_info * update * update * update * update * update * update for bugfix * update * update * update Co-authored-by: Nliutiexing <liutiexing@google.com>
-
由 LJQ❤️ 提交于
Add elementwise_fmax and elementwise_fmin operators
-
由 Liu-xiandong 提交于
Add key_padding_mask and attn_mask in sparse_attention Api 1.Key padding mask is a tensor with dimensions [batch_size, seq_len], and attention mask is a tensor with dimensions [seq_len, seq_len]. The data types of the two masks are consistent with Q, K, and V, which are float32 or float64. If the value in Mask is 0, it means that the position needs to be masked. 2.The changed files are mainly paddle/fluid/operators/sparse_attention_op.cu and python/paddle/fluid/tests/unittests/test_sparse_attention_op.py. sparse_attention has three parts: sddmm, softmax, and dsd. Adding the mask operation only needs to modify the softmax. It has no effect on the other two parts. In addition, in order to test the mask function, related tests has been added.
-
由 niuliling123 提交于
* Add the transformop parameter in TensorReduceFunctorImpl
-
由 YuanRisheng 提交于
* Reduce reshape kernel functions in pten * delete notes * fix bugs when compile * modify register name * fix compile bugs
-
由 Chen Weihang 提交于
* unify device context entrance * move all_context include to header * polish cmake relay for device_context * fix npu compile failed * fix npu compile failed * revert part of change
-