- 17 12月, 2021 11 次提交
-
-
由 zlsh80826 提交于
From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block. There are two ways to resolve this problem: Reduce the threads per block launch configuration add __launch_bound__ to give information to nvcc compiler for reducing registers usage this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.
-
由 zhaoyingli 提交于
* add gpt modeling * update file name
-
由 niuliling123 提交于
-
由 From00 提交于
* Get GPU BasePtr from CUDA allocation * Fix compile error for ROCm * Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc * Add alignment for BuddyAllocator * Set address alignment of BuddyAllocator to 32 bytes * Fix CI error * Remove code for naive_best_fit strategy
-
由 From00 提交于
-
由 Chen Weihang 提交于
-
由 Yuang Liu 提交于
-
由 limingshu 提交于
* fix_bugs_for_elementwise_branch_selection * fix merge_dims bugs * fix all influenced file
-
由 houj04 提交于
-
由 jianghaicheng 提交于
* ipu add dockerfile * resolve comments
-
由 WangXi 提交于
-
- 16 12月, 2021 29 次提交
-
-
由 Sing_chan 提交于
-
由 Leo Chen 提交于
* fix cmake * not check execution time
-
由 Sing_chan 提交于
-
由 chentianyu03 提交于
-
由 Tomasz Socha 提交于
* Faster implementation of CPU kernel for ROI_ALIGN Operator * Add missing variable to CUDA roi_align_op * Style * Fix boundaries * Rename variables for indexes calculation * Remove unnecessary emplace * Revert "Remove unnecessary emplace" This reverts commit c10e87f7fb812f1a672fde32f2690a97d47e2f5a. * Style
-
由 chentianyu03 提交于
-
由 Zhanlue Yang 提交于
* Rearranged Eager AutoCodeGen directory structure * Removed USE_OP in Eager AutoCodeGen * Enabled generation for Operators without Grad/Inputs/Outputs * Resolved operators without input * Fixed merge conflicts * Enabled Eager AutoCodeGen for 10+ more operators * Refactored Eager AutoCodeGen with more organized helper objects * Enabled Eager AutoCodeGen for operators with multiple OpBases * Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument * Handled Dispensable Inputs/Outputs in Eager AutoCodeGen * Enabled Eager AutoCodeGen for All Existing Operators & Possible Future Operators * Fixed CI issues * Fixed LD_LIBRARY_PATH for eager_code_generator
-
由 YUNSHEN XIE 提交于
-
由 xiaoting 提交于
* add activation * update activation_op * add unitest for activation * fix acosh for init, test=develop
-
由 feng_shuai 提交于
* conv_transpose_eltwiseadd_bn_fuse_pass * change timeout * add TIMEOUT * add random num for group and dilation * change PassCompat
-
由 yeliang2258 提交于
* add test for conv_elementwise_add2_act_fuse_pass and conv_elementwise_add_act_fuse_pass * Add conv_eltwiseadd_bn_fuse_pass test and fix test_conv_elementwise_addX_act_fuse_pass * add tests for conv_act_mkldnn_fuse_pass * add test for conv_bias_mkldnn_fuse_pass * update code * add conv_act_mkldnn_fuse_pass for relu, relu6, swish, leaky_relu * update test * update * update bug * update * update pattern_detector * fix test_conv_eltwiseadd_bn_fuse_pass * add diff display notest;test=windows_ci_inference * fix * remove test_conv_act_mkldnn_fuse_pass.py * ifix
-
由 Chen Weihang 提交于
* unify device context entrance * move all_context include to header * polish cmake relay for device_context * fix npu compile failed * fix npu compile failed
-
由 YUNSHEN XIE 提交于
-
由 zhangchunle 提交于
-
由 Chen Weihang 提交于
* add register_ctx_kernel and move scale kernel * polish details by reviewer comment * fix xpu compile failed * fix cmake error
-
由 wuhuanzhou 提交于
-
由 Jiabin Yang 提交于
* support eager switch system * polish code
-
由 danleifeng 提交于
* trainer_device fix and checknan tool for psgpu;test=develop * disable show_one_table;test=develop
-
由 tianshuo78520a 提交于
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * Revert "Add EventsWaiter" This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2. * add os_info * update * update * update * update * update * update for bugfix * update * update * update Co-authored-by: Nliutiexing <liutiexing@google.com>
-
由 LJQ❤️ 提交于
Add elementwise_fmax and elementwise_fmin operators
-
由 Liu-xiandong 提交于
Add key_padding_mask and attn_mask in sparse_attention Api 1.Key padding mask is a tensor with dimensions [batch_size, seq_len], and attention mask is a tensor with dimensions [seq_len, seq_len]. The data types of the two masks are consistent with Q, K, and V, which are float32 or float64. If the value in Mask is 0, it means that the position needs to be masked. 2.The changed files are mainly paddle/fluid/operators/sparse_attention_op.cu and python/paddle/fluid/tests/unittests/test_sparse_attention_op.py. sparse_attention has three parts: sddmm, softmax, and dsd. Adding the mask operation only needs to modify the softmax. It has no effect on the other two parts. In addition, in order to test the mask function, related tests has been added.
-
由 niuliling123 提交于
* Add the transformop parameter in TensorReduceFunctorImpl
-
由 YuanRisheng 提交于
* Reduce reshape kernel functions in pten * delete notes * fix bugs when compile * modify register name * fix compile bugs
-
由 Chen Weihang 提交于
* unify device context entrance * move all_context include to header * polish cmake relay for device_context * fix npu compile failed * fix npu compile failed * revert part of change
-
由 Chen Weihang 提交于
-
由 chentianyu03 提交于
* Revert "Revert "pylayer support tuple/list type args (#37727)" (#37956)" This reverts commit d848ff04. * move check args,kwargs before forward execute
-
由 Li Min 提交于
* Add float16 type for scatter op. * Add fp16 test for scatter op. * Add int and int64 support for scatter_grad on gpu. * Add int and int64 for check_variable_and_dtype routine. * Minors. * Code format.
-