- 25 2月, 2022 1 次提交
-
-
由 Li Min 提交于
* Fix compile error on cuda_arch less than 700.
-
- 24 2月, 2022 1 次提交
-
-
由 Li Min 提交于
* optimize block config and fp16 atomicAdd perf for lookup_table_v2_grad.
-
- 09 12月, 2021 1 次提交
-
-
由 sneaxiy 提交于
* fix cuda atomicAdd for FP16 * try to fix ci
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 19 11月, 2021 1 次提交
-
-
由 Siming Dai 提交于
* add cpu version, using set: sum, min, max * add cpu version: mean * improve cpu code and fix dynamic memory allcation problem * fix arg error, add index judge, delete fp16 * fix bug in CudaAtomicMax and CudaAtomicMin * add CUDA version * fix grad_op bug for index * add op test, add correct cpu grad op * Add correct CUDA Mean grad * [Add] Successful MEAN and SUM * [Add] Successful MIN and MAX in CPU * [Add] Successful MIN and MAX in CUDA * fix windows dtype ci * fix ROCM ci by adding HIP flag * rename fused_gather_scatter to send_recv * unify name as send and recv * change zero index return time * add send_recv incubate api * fix index data type, add unittest case for API * delete redundant input tensor * fix en example and docs, add default value in pool_type * add shape judge and max grid judge * fix comment * fix index type bug * add const & * fix en docs * delete numpy in examples * add unittest for int input * fix send_recv comment * change send_recv to graph_send_recv
-
- 01 6月, 2021 1 次提交
-
-
由 chentianyu03 提交于
* replace and remove complex64/128 types in custom OP and other files * fix custom_tensor_test fail bug * fix custom_conj_test fail bug * fix dispatch_test_op build fail bug
-
- 07 4月, 2021 1 次提交
-
-
由 furnace 提交于
-
- 08 2月, 2021 1 次提交
-
-
由 Qi Li 提交于
-
- 25 12月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add support for complex grad accumulated * add unittest for coverage * update test dtype * remove useless blank line
-
- 26 9月, 2020 1 次提交
-
-
由 Zhong Hui 提交于
fix cpplint error for the autmic max/min
-
- 25 9月, 2020 1 次提交
-
-
由 Zhong Hui 提交于
fix cuda atomic for ARCH<350 for the automic_max
-
- 24 9月, 2020 1 次提交
-
-
由 Zhong Hui 提交于
Add GPU Kernels of Segment Ops, support, sum, max, min, mean
-
- 31 7月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "rewrite the test case" * "follow comment"
-
- 30 7月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* cherry picked * "cherry picked platform" * "add comment" * "fix ci"
-
- 03 5月, 2018 1 次提交
-
-
由 chengduo 提交于
* fix __shfl_down_sync_ of cross_entropy * use reduceSum * "fix ci"
-
- 02 5月, 2018 2 次提交
-
-
由 chengduoZH 提交于
-
由 chengduoZH 提交于
-
- 30 4月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "re-commit " * "picked up" * "fix ci" * "fix pdb hang up issue in cuda 9"
-
- 10 4月, 2018 2 次提交
- 28 2月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 26 2月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 24 2月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 12 2月, 2018 1 次提交
-
-
由 qingqing01 提交于
-
- 10 2月, 2018 1 次提交
-
-
由 Yi Wang 提交于
-
- 23 11月, 2017 1 次提交
-
-
由 Yu Yang 提交于
* Support int64 for sum op * Refine code
-
- 18 9月, 2017 1 次提交
-
-
由 武毅 提交于
* refind accuracy_op * follow comments * follow comments
-
- 23 8月, 2017 1 次提交
-
-
由 dangqingqing 提交于
-
- 22 8月, 2017 2 次提交
-
-
由 dangqingqing 提交于
-
由 dangqingqing 提交于
1. finish lookup table CPU and GPU kernel 2. Add some cuda helper 3. Add some math funtor
-