- 11 1月, 2023 1 次提交
-
-
由 Yiqun Liu 提交于
* Implement a common PointerArray. * Polish codes. * Add including of header file. * Add the branch of kFix8. * Fix compiling error. * Add alignas hint to fix the performance drop. * Optimize the H2D copy in stack_grad. * Rename the macro. * Fix align hint for different compilers. * Polish the define of PADDLE_ALIGN. * Fix compiling error. * Remove the align hint on windows.
-
- 10 1月, 2023 3 次提交
-
-
由 limingshu 提交于
* add stack grad kernel optimization * add basic optimization kernel for stack_grad_kernel * optimization of stack_grad_kernel for last dim stack and change code format with pre-commit
-
由 Ryan 提交于
* try sequence_padding * fix cant use mutable_data * fix mistake fluid_sequence_scale.hh/CMakeLists.t include * fix namespace bug * fix framework::ToAbsOffset not found * fix codestyle
-
由 MarDino 提交于
-
- 09 1月, 2023 4 次提交
-
-
由 MarDino 提交于
* add concat optimization * refine * remove annotation * use alignas instead of aligned_storage
-
由 QingshuChen 提交于
-
由 ykkk2333 提交于
* migrate shaple sgd, split,sign xpu kernels to phi, test=kunlun * fix dlrm throughput problem, test=kunlun * add xpu einsum, fill_diagonal, and diagonal kernels, test=kunlun
-
由 wangzhen38 提交于
-
- 06 1月, 2023 3 次提交
-
-
由 RuohengMa 提交于
* add bitwise and, bitwise not, bitwise or and bitwise xor * correct typo
-
由 JYChen 提交于
* add 0-d support for paddle.kthvalue * add 0-d support for paddle.mode * fix coverage test for device * fix check-bug in windows * change axis check from LT to LE * add shape & value check for grad when input is 0d tensor
-
由 Thomas Young 提交于
-
- 05 1月, 2023 2 次提交
-
-
由 Siming Dai 提交于
* support 0D for paddle.sort/argsort * support 0D tensor for paddle.sort/argsort in xpu * fix bug * fix grad and add value assertion
-
由 zyfncg 提交于
-
- 04 1月, 2023 3 次提交
-
-
由 Wilber 提交于
-
由 Yuanle Liu 提交于
-
由 HongyuJia 提交于
* execute use kernel_key first * change OpKernelType->KernelKey * fix py3 compile error, remove redundant header files * fix build_strategy_test * fix DataType::RAW * fix custom_type test: operator_test.cc * fix transform place * fix backends_are_same_class * try fix place TransDataDevice * support all KernelKey * fix TransformData * fix place_are_same_class * fix merge * fix test_params_no_grad * fix specific place of GetExpectedKernelType * fix specific place of GetExpectedKernelType * fix GetKernelTypeForVar * fix dtype error * fix fetch_v2 * change GetKernelTypeForVar * fix interpreter * fix typo error * polish codes * polish codes * polish codes * fix conflict
-
- 03 1月, 2023 3 次提交
-
-
由 limingshu 提交于
-
由 zhoutianzi666 提交于
* Implement conv2d_fusion NHWC format using CUTLASS * Add unit testing for CUTLASS Conv in inference * Add experimental API for CUTLASS.
-
由 Yiqun Liu 提交于
* Use BroadcastKernel and ReduceKernel to optimize expand and expand_grad. * Correct the axis when there is only 1 input in BroadcastKernel. * Add the calculate of output's shape.
-
- 31 12月, 2022 1 次提交
-
-
由 caozhou 提交于
-
- 30 12月, 2022 1 次提交
-
-
由 Sanbu 提交于
* 1219 * temporarily change the num_diff_files limit, test=document_fix * Revert "temporarily change the num_diff_files limit, test=document_fix" This reverts commit 8e70f00ef468d2dad0e38b3da06295ed62990d20. * for codestyle * remove duplicate license * `static mode` -> `static graph mode` * Update hybrid_parallel_inference.py * Update layer_function_generator.py * Update manipulation.py * reset Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com> Co-authored-by: NSigureMo <sigure.qaq@gmail.com>
-
- 29 12月, 2022 1 次提交
-
-
由 ykkk2333 提交于
-
- 28 12月, 2022 3 次提交
-
-
由 sprouteer 提交于
-
由 xiaoxiaohehe001 提交于
-
由 Haohongxiang 提交于
-
- 27 12月, 2022 3 次提交
-
-
由 zhangyikun02 提交于
-
由 xiaoting 提交于
* fix fold for large bs * fix fold for large bs
-
- 26 12月, 2022 2 次提交
- 23 12月, 2022 6 次提交
-
-
由 QingshuChen 提交于
-
由 Yuanle Liu 提交于
-
由 Charles-hit 提交于
* fix matmul double and triple grad * remove some comment * add matmul_double_grad unit test * fix matmul triple grad * fix dot triple grad and add unit test * modify codestyle * fix dot_grad * refactor dot triple grad * disable some unit test * fix unit test * fix unit test in double grad
-
由 haosicheng 提交于
-
由 Hui Zhang 提交于
* add warp transducer code
-
由 MarDino 提交于
* register half datatype * register roll grad fp16 kernel
-
- 22 12月, 2022 4 次提交
-
-
由 xiaoxiaohehe001 提交于
-
由 Zhang Zheng 提交于
* Optimize performance of batch_norm_bwd with NHWC layout and infer mode * fix
-
由 Zhang Zheng 提交于
-
由 QingshuChen 提交于
-