- 25 4月, 2022 2 次提交
-
-
由 BrilliantYuKaimin 提交于
* Add infermeta for ChannelShuffle * Create channel_shuffle_grad_kernel.h * Create channel_shuffle_kernel.h * Create channel_shuffle_sig.cc * Create channel_shuffle_op.cc ChannelShuffle算子的描述 * Create channel_shuffle_kernel_impl.h ChannelShuffle核函数的实现 * Create channel_shuffle_grad_kernel_impl.h ChannelShuffle反向核函数的实现 * Add kernel register of channel shuffle and grad 注册ChannelShuffle及其反向的核函数 * add nn.functional.channel_shuffle * add nn.ChannelShuffle * Create test_channel_shuffle.py * Update example of ChannelShuffle in vision.py * Update test_channel_shuffle.py * 修改channel_shuffle核函数的实现位置 * 修正代码格式 * 删除多余空格 * 完善channel_shuffle的错误检查 * Update unary.cc * Update channel_shuffle_op.cc * Update test_channel_shuffle.py * Update unary.cc * add channel_shuffle * Update test_channel_shuffle.py * Update vision.py * 调整代码格式 * Update channel_shuffle_sig.cc * 更新ChannelShuffle的文档 * 更新channel_shuffle的文档 * remove ChannelShuffleOpArgumentMapping * add ChannelShuffleGradInferMeta * Update channel_shuffle_op.cc * 调整channel_shuffle及其梯度的核函数的位置
-
由 Chen Weihang 提交于
-
- 23 4月, 2022 1 次提交
-
-
由 Aurelius84 提交于
* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT * [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT
-
- 22 4月, 2022 1 次提交
-
-
由 zhangkaihuo 提交于
-
- 21 4月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* support int16 argmax kernel * add fp16 test
-
- 20 4月, 2022 1 次提交
-
-
由 BrilliantYuKaimin 提交于
* 增加logspace的算子描述 * 增加logspace的形状推断 * 增加logspace核函数实现 * 在python中增加logspace接口 * 增加logspace单测 * 增加logspace * Update logspace_kernel.cu * Update logspace_op.cc * 调整代码格式 * Update doc of logspace * Update tensor.py * Update logspace_op.cc * Update logspace_kernel.cc * Update logspace_kernel.cu * Update test_logspace.py * 调整 logspace 的位置 * 调整代码格式
-
- 19 4月, 2022 1 次提交
-
-
由 YuanRisheng 提交于
[Phi]Separate AddKernel/DivideKernel/SubtractKernel/MultiplyKernel from ElementwiseKernel(Part1) (#41806) * seperate add/div/sub/mul from elementwise * delete code * fix compile bugs * deal with conflict * fix bugs when compile * fix windows unit test bug * fix ci converage bugs
-
- 18 4月, 2022 3 次提交
-
-
由 Lijunhui 提交于
-
由 zhangkaihuo 提交于
-
由 Siming Dai 提交于
* add eids result for graph_sample_neighbors * fix bug * move fisher_yates sample to warp * add cpu eid output * delete comment * delete comment * change nullptr placeholder * optimize sample kernel * fix mutable_data
-
- 17 4月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* split phi and fluid infermeta context * resolve conflict * fix type error * optimize scheduling perf * spec small vector size * replace all grad var name * fix test failed * move init defalut signature * polish details * polish details * fix no init bug * init sig for tests * add init sig for infer * fix infrt error * fix infrt failed * fix kunlun error * fix infrt failed
-
- 16 4月, 2022 1 次提交
-
-
由 王明冬 提交于
-
- 15 4月, 2022 5 次提交
-
-
由 chentianyu03 提交于
* split reduce_kernel * rm reduce_kernel in cmake * split reduce_grad kernels * fix cmake build error * format code * fix standalone_executor_test error
-
由 Zhanlue Yang 提交于
* [DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad * Fixed elementwise issue * Addressed CI failures * [DoubleGrad] Enabled test_imperative_triple_grad test cases under eager_mode * [DoubleGrad] Enabled test_autograd_functional_dynamic.py under eager mode * Enabled more test cases * [DoubleGrad] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode * Adjusted test_imperative_star_gan_with_gradient_penalty.py
-
由 zhangkaihuo 提交于
-
由 limingshu 提交于
* change cudnn helper for auto-tune * Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm. * Fix the bug in calculating and printing current step cache hit rate. * Improve the autotune cache and fix unittest. * Change the key from AlgorithmType to int64_t. * Fix unittest for cpu-only env. * change ChooseAlgoByWorkspace for heuristic mode Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
由 hong 提交于
* try to fix batch norm memory issue * fix batch norm memroy alloc bug * polish some code
-
- 14 4月, 2022 3 次提交
-
-
由 Lijunhui 提交于
* regist elementwise_xxx
-
由 Aurelius84 提交于
-
由 Chen Weihang 提交于
* chnage dispatch to visit * resolve conflict
-
- 13 4月, 2022 2 次提交
-
-
由 hong 提交于
* add expand, poisson * add poison grad * add expand equal_all poisson triangular solve yaml
-
由 zhangkaihuo 提交于
-
- 12 4月, 2022 8 次提交
-
-
由 hong 提交于
* add layer norm infermeta * add layer norm yaml * polish layer norm infer meta * add layer norm to black list
-
由 chentianyu03 提交于
* exchange assign and assign_raw kernel name * fix register error
-
由 hong 提交于
-
由 Lijunhui 提交于
* init commit no push * collect comile errors * bitwise UT * fix compile problem * cancel comments * restore miss deletion * fix compilation * fix UT * NO stash in multiple branch at the same times * fix error * combine .cu from gpu and kps * replace gpu by kps * fix by Chen-weihang * Revert "Fix kps compile error in Junhui logic compare bitwise" * fix backend test * rm comments Co-authored-by: NChen Weihang <chenweihang@baidu.com>
-
由 wuyefeilin 提交于
-
由 Zhanlue Yang 提交于
* [DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad * Fixed elementwise issue * Addressed CI failures
-
由 Aurelius84 提交于
* [Phi]Fix beta1_pow/beta2_pow/skip_update data transform problem in adam/adamw * fix xpu unittest failed
-
由 FlyingQianMM 提交于
add a inner loop for index_select_grad_init() in index_select op when dealing with large-shape data (#41563) * replace for with CUDA_KERNEL_LOOP for index_select_grad_init() in index_select op * use CUDA_KERNEL_LOOP_TYPE * fix code style * replace index_select_grad_init with SetConstant
-
- 11 4月, 2022 3 次提交
-
-
由 YuanRisheng 提交于
* add multi_dot,maxout,multiplex yaml * add code converage
-
由 chentianyu03 提交于
* add assign yaml * add assign api * add assign backward api * add assign * add assign yaml * add assign * assign yaml * add assign raw kernel and use assign_raw in yaml * merge develop branch * add missing python_api
-
由 sneaxiy 提交于
-
- 10 4月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
-
- 09 4月, 2022 2 次提交
-
-
由 hong 提交于
-
由 limingshu 提交于
* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode. * Use the system cudaMalloc and cudaFree to allocate workspace during searching. * Enable switch of two kind of workspace setting methods. Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 08 4月, 2022 1 次提交
-
-
由 Jack Zhou 提交于
-
- 07 4月, 2022 3 次提交
-
-
由 zhouweiwei2014 提交于
-
由 YuanRisheng 提交于
* add yaml * perfect converage
-
由 zhouweiwei2014 提交于
-