- 24 3月, 2023 7 次提交
-
-
由 YuanRisheng 提交于
* decouple memory copy * fix ci bugs * fix ci compile bugs * fix rocm compile * fix ci bugs * decouple memory * deal with conflict * fix xpu compile bugs * fix xpu bugs * deal with xpu bugs * fix cmake bugs * fix windows bugs * fix ci bugs * fix ci bugs * delete redundance code * add code for pybind * fix py3 bugs * fix ci bugs
-
由 wanghuancoder 提交于
* delete old dygraph, mlu npu do not use dygraph
-
由 Zheng-Bicheng 提交于
update
-
由 Zheng-Bicheng 提交于
-
由 ZhangDY-6483 提交于
* first version, notest * return final rst, notest * use infinity() instead of max * ut structure * start up of ut * generate lse * update * add depense * reconstruct cmake * move file * add memory efficient attention and fix blasimpl * update * update cmake * add namespace * update cmake * use .cu * update for pad3d * bug fix * bug fix * update * bug fix * update enforce * add test case * merge the lse pad * fix kernel_fn of backward * fix PADDLE_ENFORCE_EQ and phi_api * fix PADDLE_ENFORCE * fix PADDLE_ENFORCE * rerun coverage * fix memory efficient attention test * rerun ci * add cuda version condition * add cuda version condition * delete WIP test * replace PADDLE_ENFORCE * edit the namespace of datatype in multiple.cc * rerun * rerun --------- Co-authored-by: Nliuyuang <liuyuang@baidu.com>
-
由 wanghuancoder 提交于
* xpu do not test dygraph in dygraph
-
由 Yuang Liu 提交于
-
- 23 3月, 2023 18 次提交
-
-
由 HongyuJia 提交于
-
由 Wangzheee 提交于
-
由 xiaoguoguo626807 提交于
* delete prim flag for matmul_2_grad * delete prim flag for matmul_2_grad * add new setgradoutmeta for matmul_double_grad_node * modify test and delete log * deal with review
-
由 chenjian 提交于
* add meshgrid composite rule * add meshgrid composite rule * update * add into CMakeLists * fix * update * update * optimize code * fix meshgrid op * update test
-
由 wanghuancoder 提交于
* delete old dygraph xpu op test
-
由 Huang Jiyi 提交于
* unify add_position_encoding * unify affine_channel * unify alloc_float_status * unify allreduce * unify alltoall * unify anchor_generator * unify ascend_trigger * fix bug * fix test
-
由 Huang Jiyi 提交于
* update * update * update * update * update * fix test
-
由 cxxly 提交于
-
由 caozhou 提交于
* add patterns * update rule based tuner * add forward sub program completion * add unittest * add bwd sub program completion
-
由 Lin Manhui 提交于
* Add bf16 support for elementwise_pow * Update ut
-
由 Yuang Liu 提交于
-
由 yeliang2258 提交于
* add bf16 and fp16 tests * fix dtype check
-
由 LoneRanger 提交于
* add fp16 and bfp16 for temporalshift * add fp16 and bfp16 for complex * fix bug * fix bug * add fp16 and bf16 for conj * fix bug * fix bug * Update complex_kernel.h fix bug * Update temporal_shift_grad_kernel.h fix bug * Update temporal_shift_kernel.h fix bug
-
由 Infinity_lee 提交于
-
由 PuQing 提交于
[CodeStyle][C408][C409][C410] Fix unnecessary <dict/list/tuple> call and unnecessary <list/tuple> passed to <list/tupule>() (#51928) * autofix * add select config * autofix C410 * add C410 select
-
由 denglianbin 提交于
* finish pr * skip cpu test for logical * change test style * fix error.
-
由 Infinity_lee 提交于
-
由 张春乔 提交于
-
- 22 3月, 2023 15 次提交
-
-
由 YangQun 提交于
* support 0-d tensor for element wise unary ops * fix python code style check * fix approval check * support 0-d tensor for onednn softmax and logsoftmax kernels * fix commnets * fix some unittests
-
由 ShenLiang 提交于
-
由 Ghost Screaming 提交于
* Add fused_feed_forward pass for semi-automatic static graph training. * Add fused_feedforward property in parallel_executor.cc * Polish code. * Polish fused feed_forward pass code. Support use_dropout1 and use_dropout2 option. * Support model parallel in fused_feedforward pass.
-
由 Sławomir Siwek 提交于
* extract common methods to reuse * add header for transpose ops * fused_transpose * Split big function * transpose2 tests * fused_transpose * Apply extra attributes * add pbtxt file * update pbtxt * Merge develop * add more strict op compats * code style * remove mkldnn_data_type * unify SetOutMemDescWithReshape2FuseSupport * adjust quantize-dequantize for transpose * remove appendact * transpose2 quantization * fix int8 tests * adjust transpose_op to current develop * delete fusion code from transpose_kernel * add fused transpose to NHWC unittest * change order
-
由 Bo Zhang 提交于
* test_logit_op * add cudaKernel to replace eigen impl * bf16 unit test CI
-
由 houj04 提交于
-
由 Zhang Zheng 提交于
* [AMP OP&Test] Fix fp16 check_grad when user_defined_grads are not None * fix cond
-
由 HongyuJia 提交于
* [CustomOP Optional] CustomOP supports optional Tensor * fix test_custom_concat, add pytest to CMakeLists
-
由 LoneRanger 提交于
* remove net_drawer.py * remove memory_analysis.py * remove test_memory_analysis.py
-
由 niuliling123 提交于
-
由 kangguangli 提交于
* fix raw_program_optimizer not apply when using amp * fix CI
-
由 wangxiaoning 提交于
* max comp * fix * add test * fix * fix * fix * fix * fix test * fix api
-
由 sneaxiy 提交于
* add fused_linear_param_grad_add_kernel * fix compile error * remove flag * fix ci compile error * fix ci compile error * revert pylayer revision * fix ci ut * improve performance
-
由 Yuanle Liu 提交于
-
由 iSerendipity 提交于
-