- 19 10月, 2022 1 次提交
-
-
由 will-jl944 提交于
-
- 18 10月, 2022 2 次提交
-
-
由 seemingwang 提交于
* add embedding range check * change head file * change head file * fix
-
由 liu zhengxi 提交于
-
- 17 10月, 2022 2 次提交
-
-
由 Ghost Screaming 提交于
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * support pure bfloat16 * support bf16 linear * update PR to pass CI * tiny fix where_grad_kernel.cu * Support bfloat16 type for reducer and sharding. * Fix some bug. * Polish code. * Polise code. * Add bfloat16 datatype in fill_grad kernels. Co-authored-by: Nsneaxiy <sneaxiy@126.com>
-
由 YuanRisheng 提交于
* namespace modify * update by comment
-
- 13 10月, 2022 4 次提交
-
-
由 xiaohemaikoo 提交于
-
由 zhouweiwei2014 提交于
-
由 Zhang Ting 提交于
* Revert "【Hackathon No.56&38】deformable_conv_v1 算子实现 float16 数据类型支持&前向运行加速 (#46111)"
-
由 Zhang Zheng 提交于
* Correct the logic and remove unnecessary template param * fix error throw * fix print format * fix ci
-
- 12 10月, 2022 1 次提交
-
-
由 Zhang Ting 提交于
This reverts commit 8a5f17e8.
-
- 11 10月, 2022 1 次提交
-
-
由 Feiyu Chan 提交于
-
- 10 10月, 2022 3 次提交
- 30 9月, 2022 3 次提交
-
-
由 Zhang Zheng 提交于
* Optimize performance of depthwise_conv_bwd of filter * op-benchmark * fix * op benchmark * merge bwd
-
由 Zhang Zheng 提交于
* Optimize performance of depthwise_conv_bwd * fix
-
由 sneaxiy 提交于
* support pure bfloat16 * support bf16 linear * update PR to pass CI * tiny fix where_grad_kernel.cu * add bfloat16 to selu_grad to pass CI * fix selu grad compilation error
-
- 29 9月, 2022 3 次提交
-
-
由 Zhang Zheng 提交于
* Move valid check from python to kernel * fix error throw * fix * invalid label check * fix * Revert "fix" This reverts commit 79fad6799cfa4b30423dbc84e67d7d843d22b84a. * Revert "invalid label check" This reverts commit 402a9707390ad5386b3222e85844b92d2e9b9fa4. * Revert "fix" This reverts commit 09ba3080ee0587447f875c19cdf060485f15ae3b. * Revert "fix error throw" This reverts commit a901bfcc2179d5c120ec29af766f392b122dab52. * Revert "Move valid check from python to kernel" This reverts commit baa03cc4ef82d8d45516c30dfb52bf5aead30748. * final fix * fix * fix
-
由 carryyu 提交于
* fix P40 topk: Make the optimized topk compatible with P40. * fix P40 topk: Make the optimized topk compatible with P40. * fix P40 topk: Make the optimized topk compatible with P40.
-
由 傅剑寒 提交于
-
- 28 9月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* remove needless using tensor * remove needless using tensor * resolve conflict * replace tensor using * fix format error * revert needless changing * fix rocm and npu compile error * fix cinn compile error * fix format error * fix mkldnn format error * fix mkldnn format error * fix cinn compile error * fix cinn compile error * fix cinn compile error * resolve conflict
-
- 26 9月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
-
- 23 9月, 2022 2 次提交
-
-
由 Zhang Zheng 提交于
* Optimize performance of depthwise_conv_fwd * fix
-
由 YuanRisheng 提交于
-
- 22 9月, 2022 1 次提交
-
-
由 carryyu 提交于
* Optimize topk's performance when k is small and input_width is large * 修改blockdim设置逻辑 * Update top_k_function_cuda.h
-
- 21 9月, 2022 3 次提交
-
-
由 ccrrong 提交于
* add fp16 support * update * update half * code format * fix unittest * fix rocm compile error * code format * code format * fix rocm compile error * fix rocm compile error
-
由 Zhen Wang 提交于
* use cinn in the paddle inference * fix some cmake errors * Avoid division by zero in the arange_kernel. * Avoid dynamic ops. * Remove some useless codes. * Use OpTransInfo to encapsulate some codes used in the build_cinn_pass.
-
由 5u13 提交于
-
- 20 9月, 2022 4 次提交
-
-
由 傅剑寒 提交于
-
由 YuanRisheng 提交于
-
由 Jiabin Yang 提交于
* fix linspace error in amp * fix log * fix amp error * fix ocr error which caused by amp * add more check * rename dtype ns
-
由 HongyuJia 提交于
* polish code comments * polish data_device_transform.cc
-
- 19 9月, 2022 3 次提交
-
-
由 YuanRisheng 提交于
* move sum * fix ci bugs * fix ci bugs * fix set_lod bugs * fix infershape bugs * fix ci bugs * fix ci unittest bug * fix ci bugs * perfect code * update code according comment * add unittest * fix ci bugs
-
由 Chen Weihang 提交于
This reverts commit c252b1de.
-
由 RichardWooSJTU 提交于
* fix return order error and duplicate results with specific inputs
-
- 18 9月, 2022 1 次提交
-
-
由 YuanRisheng 提交于
* perfect softmax functor * fix compile bugs * fix ci bugs
-
- 16 9月, 2022 2 次提交
-
-
由 sneaxiy 提交于
* support int64 non-broadcast * support broadcast case for int64 index * fix bug * support more Arity * remove some codes * upgrade patchelf to v0.15.0 to pass CI build * fix bug * fix patchelf installation * add debug flags * remove useless codes * fix viterbi_decode and set_value op uts * remove always enable int64
-
由 Zhang Zheng 提交于
-
- 15 9月, 2022 2 次提交