- 20 10月, 2022 1 次提交
-
-
由 sneaxiy 提交于
support pure bfloat16 for more ops
-
- 12 8月, 2022 1 次提交
-
-
由 Siming Dai 提交于
* add init file * add op definition and infermeta * add kernel definition funcs * add broadcast infer shape * add gpu forward kernel * delete SUB and DIV * add x_grad * add template * add e_grad for min and max * fix small bug * temp commit * temp commit * add e_grad for sum and mean * fix some compile bug * fix compile bugs * fix compile problem * add sum forward unittest * fix broadcast error, add kernel sig, register e_grad, change unit test * fix grad * add temp grad fix * temp commit * add min max unittest * add max, min unittest, fix mul bug * add cpu forward sum and mean * add forward min max, fix mean unittest * add cpu backward min max * fix code-style * add backward sum mean * fix rocm ci * set uniitest timeout * fix bug of x broadcast to e, gpu grad * fix bug of x broadcast to e, cpu grad * rename BOOST_GET_CONST macro * fix rocm ci * mv graph_send_e_recv to graph_send_ue_recv * move out_size to IntArray * add eager op test * fix max pool type bug, add unittest for api * revise api doc * add fp16 for atomic min and max, add unittest * add unittest * add fp16 support for graph_send_recv * fix unittest fp16 bug * change OutSizeTensor to Out_size * move E to Y * add copyright, fix comment * review code * fix thread block size * fix thread block size * change api attribute name: pool_type to reduce_op, compute_type to message_op * change api attribute name, move pool_type to reduce_op, move compute_type to message_op
-
- 26 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 01 3月, 2022 1 次提交
-
-
由 zhangbo9674 提交于
* add scale gather sum * refine CUDA_ATOMIC_WRAPPER ADD for bf16 * add gather unittest * solve conflict * add scale uinttest * add sum unittest * solve conflict * refine gather unittest * refine unittest
-
- 25 2月, 2022 1 次提交
-
-
由 Li Min 提交于
* Fix compile error on cuda_arch less than 700.
-
- 24 2月, 2022 1 次提交
-
-
由 Li Min 提交于
* optimize block config and fp16 atomicAdd perf for lookup_table_v2_grad.
-
- 09 12月, 2021 1 次提交
-
-
由 sneaxiy 提交于
* fix cuda atomicAdd for FP16 * try to fix ci
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 19 11月, 2021 1 次提交
-
-
由 Siming Dai 提交于
* add cpu version, using set: sum, min, max * add cpu version: mean * improve cpu code and fix dynamic memory allcation problem * fix arg error, add index judge, delete fp16 * fix bug in CudaAtomicMax and CudaAtomicMin * add CUDA version * fix grad_op bug for index * add op test, add correct cpu grad op * Add correct CUDA Mean grad * [Add] Successful MEAN and SUM * [Add] Successful MIN and MAX in CPU * [Add] Successful MIN and MAX in CUDA * fix windows dtype ci * fix ROCM ci by adding HIP flag * rename fused_gather_scatter to send_recv * unify name as send and recv * change zero index return time * add send_recv incubate api * fix index data type, add unittest case for API * delete redundant input tensor * fix en example and docs, add default value in pool_type * add shape judge and max grid judge * fix comment * fix index type bug * add const & * fix en docs * delete numpy in examples * add unittest for int input * fix send_recv comment * change send_recv to graph_send_recv
-
- 01 6月, 2021 1 次提交
-
-
由 chentianyu03 提交于
* replace and remove complex64/128 types in custom OP and other files * fix custom_tensor_test fail bug * fix custom_conj_test fail bug * fix dispatch_test_op build fail bug
-
- 07 4月, 2021 1 次提交
-
-
由 furnace 提交于
-
- 08 2月, 2021 1 次提交
-
-
由 Qi Li 提交于
-
- 25 12月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add support for complex grad accumulated * add unittest for coverage * update test dtype * remove useless blank line
-
- 26 9月, 2020 1 次提交
-
-
由 Zhong Hui 提交于
fix cpplint error for the autmic max/min
-
- 25 9月, 2020 1 次提交
-
-
由 Zhong Hui 提交于
fix cuda atomic for ARCH<350 for the automic_max
-
- 24 9月, 2020 1 次提交
-
-
由 Zhong Hui 提交于
Add GPU Kernels of Segment Ops, support, sum, max, min, mean
-
- 31 7月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "rewrite the test case" * "follow comment"
-
- 30 7月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* cherry picked * "cherry picked platform" * "add comment" * "fix ci"
-
- 03 5月, 2018 1 次提交
-
-
由 chengduo 提交于
* fix __shfl_down_sync_ of cross_entropy * use reduceSum * "fix ci"
-
- 02 5月, 2018 2 次提交
-
-
由 chengduoZH 提交于
-
由 chengduoZH 提交于
-
- 30 4月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "re-commit " * "picked up" * "fix ci" * "fix pdb hang up issue in cuda 9"
-
- 10 4月, 2018 2 次提交
- 28 2月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 26 2月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 24 2月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 12 2月, 2018 1 次提交
-
-
由 qingqing01 提交于
-
- 10 2月, 2018 1 次提交
-
-
由 Yi Wang 提交于
-
- 23 11月, 2017 1 次提交
-
-
由 Yu Yang 提交于
* Support int64 for sum op * Refine code
-
- 18 9月, 2017 1 次提交
-
-
由 武毅 提交于
* refind accuracy_op * follow comments * follow comments
-
- 23 8月, 2017 1 次提交
-
-
由 dangqingqing 提交于
-
- 22 8月, 2017 2 次提交
-
-
由 dangqingqing 提交于
-
由 dangqingqing 提交于
1. finish lookup table CPU and GPU kernel 2. Add some cuda helper 3. Add some math funtor
-