- 30 12月, 2021 4 次提交
-
-
由 Chen Weihang 提交于
* remove offset in storage * revert api change * fix custom op slice bug * fix mutable_data error
-
由 From00 提交于
-
由 Xiaoxu Chen 提交于
* add dirichlet sample op and cpu backend kernel * add Dirichlet op cuda kernel (#6) * add dirichlet op hip kernel Co-authored-by: NFeiyu Chan <chenfeiyu@baidu.com>
-
由 Leo Guo 提交于
* Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align" and "roi_align_grad" op in xpu2 op list. * Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align" and "roi_align_grad" op in xpu2 op list. test=kunlun Co-authored-by: NZibin <guozibin@baidu.com>
-
- 29 12月, 2021 13 次提交
-
-
由 Zhanlue Yang 提交于
-
由 yaoxuefeng 提交于
add hashtable dynamic mf support
-
由 yaoxuefeng 提交于
add dynamic mf size api
-
由 JZ-LIANG 提交于
* auto parallel sharding base * chmod * add unitest * set unitest cmake dist label * revise code according to rewiew * chmod
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * Revert "Add EventsWaiter" This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2. * update OS info * split host_event_recorder * split host_event_recorder * update * update * update * update * update * update * update Co-authored-by: Nliutiexing <liutiexing@google.com>
-
由 Huihuang Zheng 提交于
Fix Buddy Allocator random CI failure due to machine environment.
-
由 ykkk2333 提交于
-
由 TTerror 提交于
* add argsort/scatter for kunlun * update test_scatter * update xpu.cmake * update xpu.cmake * fix scatter
-
由 sneaxiy 提交于
-
由 Tao Luo 提交于
-
由 sneaxiy 提交于
-
由 limingshu 提交于
-
由 WangXi 提交于
-
- 28 12月, 2021 12 次提交
-
-
由 limingshu 提交于
* first commit * pass ctest of elementwise_div_grad
-
由 From00 提交于
* fix reshape move storage error * remove needless set type * alloc tensor by shared storage * Utilize StreamSafeCUDAAllocator to support fast GC in new executor * Fix compile error for Windows and ROCm * Fix compile error for Windows * Modify UT stream_safe_cuda_alloc_test * Modify UT stream_safe_cuda_alloc_test * Rewrite fast GC * Rewrite fast GC * Fix compile error for BOOST_GET_CONST * Fix compile error for BOOST_GET_CONST * Changes default stream for StreamSafeCUDAAllocator * Fix a small CI error * Remove some redundant code * Fix conflict * Fix compile error for ROCm * Fix Windoes CI error * Fix CI error * Remove some unnecessary code * Fix CI error * Add UT for fast GC * Fix CI error * add device-agnostic stream class * add stream.h * fix ut * fix cpu compile * Use RWLock in GetAllocator * Fix CI error Co-authored-by: NChen Weihang <chenweihang@baidu.com> Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
-
由 Jiabin Yang 提交于
* Rearranged Eager AutoCodeGen directory structure * Removed USE_OP in Eager AutoCodeGen * Enabled generation for Operators without Grad/Inputs/Outputs * Resolved operators without input * Fixed merge conflicts * Enabled Eager AutoCodeGen for 10+ more operators * Refactored Eager AutoCodeGen with more organized helper objects * Enabled Eager AutoCodeGen for operators with multiple OpBases * Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument * Handled Dispensable Inputs/Outputs in Eager AutoCodeGen * Adjusted function generation/call between Python-C API & Dygraph API * Synchronized auto-generated Python-C API with Dygraph Forward Functions * support more eager tensor api * fix merge compile error * fix compile error and fit develop code * support pure CPU * fix some logic error in eager_mode * support _varbase_creator in eager mode * Added safe_initialized interface to EagerTensor for use in processing dispensable inputs * for eager mode * refine * support multiple constructor for eager tensor * add place related code * polish code * specific randint with dtype of int64 * Support pure cpu test * eager logic * refine test in pure cpu * eager logic * eager logic * eager logic, test=develop * skip core.eager when in inference, test=develop * refine, test=develop * refine, test=develop * call RetainGrad after run forward kernel, test=develop * refine, test=develop * support dygraph util, meta, guard test * support inference test * refine test and fix initializer failed * support create varbase and fix retain grad error * fix windows error * support test code coverage * support test code coverage * support test code coverage Co-authored-by: Njim19930609 <jim19930609@gmail.com> Co-authored-by: NWang Huan <wanghuan29@baidu.com>
-
由 zyfncg 提交于
* refactor matmul directory in pten * fix merge conflict
-
由 huangxu96 提交于
* add API and op for take_along_axis * fix compile dependency problem and add example code and doc * add unitest * delete some code for CI coverage * fix code style problem * fix as review
-
由 Guoxia Wang 提交于
-
由 Tao Luo 提交于
* add amax/amin * support axis is list
-
由 chentianyu03 提交于
* remove intype arg in cast kernel * modify conj config in api.yaml by dictionary order * rm unused code in cast_kernel.cu
-
由 houj04 提交于
* add reduce_prod_xpu. fix reduce_mean_xpu bug. * iadd reduce_prod_xpu. fix reduce_mean_xpu bug. test=kunlun
-
由 Leo Chen 提交于
* add completion_nofifier * fix bug * unregist event waiter
-
由 baoachun 提交于
* add mul_lstm_fuse_pass ut * update mul_lstm_fuse_pass ut * update ut * update ut * update ut * add CPU ut cmake setting * update ut
-
由 Li Min 提交于
-
- 27 12月, 2021 10 次提交
-
-
由 WangXi 提交于
-
由 pangyoki 提交于
* fix accumulator bug * fix unittest
-
由 ShenLiang 提交于
* fix bug in pfp16 * fix hip * fix hip
-
由 baoachun 提交于
-
由 baoachun 提交于
* update mkldnn matmul_v2_transpose_reshape_fuse_pass ut * update mkldnn matmul_v2_transpose_reshape_fuse_pass ut * update ut * update ut
-
由 Leo Chen 提交于
* add device-agnostic stream class * add stream.h * fix ut * fix cpu compile
-
由 sneaxiy 提交于
-
由 limingshu 提交于
* No harm to KP * Pass the compile stage * change the WriteData function * fix template bugs and pass ctest of current elementwise * for passing partial template specialization of tempalte function in CI-ROCm * To make 'WriteData' funtion flexible. * a less harmful way to support multi-output * a less harmful way to support multi-output
-
由 baoachun 提交于
-
由 Guoxia Wang 提交于
-
- 26 12月, 2021 1 次提交
-
-
由 Chen Weihang 提交于
* add register general kernel marco * move copy kernel impl * revert needless change * polish details * fix xpu compil faild * fix xpu compile failed * polish format
-