- 29 1月, 2022 6 次提交
-
-
由 Guanghua Yu 提交于
-
由 JZ-LIANG 提交于
* support qkv fuse * support qkv fuse * update completion * update completion * update dist_split * rerun ci * is_auto_compatible added * is_auto_compatible added
-
由 QingshuChen 提交于
* fix kunlun2 softmax unitest bug *test=kunlun * minor
-
由 Jack Zhou 提交于
-
由 hlygit66666 提交于
* add fuse_relu_depthwise_conv_pass unittest * fix atol and rtol * fix according to review * Update test_dist_fuse_relu_depthwise_conv_pass.py
-
由 Leo Chen 提交于
-
- 28 1月, 2022 18 次提交
-
-
由 Zhanlue Yang 提交于
* Removed debug info * Added automatic code generation for final state Eager Dygraph * Modified backward yaml * Added EagerUtils helper functions for final state CodeGen * Adjusted CMakeFiles to support compilation for final state auto generated codes * Fixed final state eager codegen * Fixed CI problems * Fixed yaml.load() method failure * Turned final state codegen off for now * Fixed minor issue
-
由 Zhanlue Yang 提交于
-
由 Chen Weihang 提交于
* update forward argument mapping * fix compile failed * fix test failed
-
由 From00 提交于
-
由 zhangkaihuo 提交于
-
由 Roc 提交于
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * Revert "Add EventsWaiter" This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2. * split template * Add Profiler and HostTracer * update * update * update * updateg * fix cmake Co-authored-by: Nliutiexing <liutiexing@google.com>
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * Revert "Add EventsWaiter" This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2. * Set thread name for WorkQueue Co-authored-by: Nliutiexing <liutiexing@google.com>
-
由 YuanRisheng 提交于
-
由 YuanRisheng 提交于
* refactor scale kernel that its input is selected_rows * complement upload file
-
由 hong 提交于
* move digamma to pten; test=develop * fix mutable_data bugs; test=develop * remove useless code; test=develop * remove kernel compute; test=develop * fix bug; test=develop
-
由 wenbin 提交于
* slice * shuffle pass enhancement
-
由 Fan Zhang 提交于
* [PSLIB] Add Metrics Module, Support User-defined Add Metric * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI Coverage * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI Coverage * [PSLIB] Modify According to CI Coverage * [PSLIB] Modify According to CI Coverage * modify role_maker * update CMakeLists.txt
-
由 zyfncg 提交于
* remove remake densetensor * fix eager test error * fix bug in eager * implement AllocateFrom * remove WriteBackOutput * fix problem of eager Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>
-
由 Baibaifan 提交于
-
由 Weilong Wu 提交于
* Refactor TensorAdd func by template and remove gradient_accumulation in eager * Remove needless target name * Use overload instead of template
-
由 zyfncg 提交于
-
由 Weilong Wu 提交于
* implement AllocateFrom * fix PR-CI-Coverage timeout in 120s Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>
-
- 27 1月, 2022 16 次提交
-
-
由 zhangkaihuo 提交于
-
由 Siming Dai 提交于
* add the test case for the UVA * add the context load for the uva * Add graph_sample kernel * Add graph_sample commit * add new commit for graph_sample * add unsigned long long int * delete some remarks * add cpu version * add cuda eids * add cpu eids * delete _uva * optimize speed: emplace_back, last_layer * add to_uva_tensor * add cpu return_eids choice * add gpu return_eids choice * add cpu reindex_nodes * add gpu reindex_nodes * rename op and add OMP for cpu * add incubate api * fix the compile problem for the PADDLE_ENFORE and different device * fix the rcom and windows compile problem * add unittest for graph_sample_neighbors * fix cpu unittest and unique problem * fix uva unittest, fix cuda unique problem * fix the windows compile problem * fix the windows rand_r compile problem * add correct unittest, add src_eids dispensable * delete black * combine uva unittest * mv Sample_index to Sample_Index; check input shape; fix random sample func * delete memset & cudaMemset * fix according to PR comments * fix rocm ci * modify function names according to the specification * fix windows_openblas ci * refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors * fix rocm ci * rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc * add data type * fix conflict Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
-
由 Leo Chen 提交于
-
由 Chen Weihang 提交于
* add constructor for win * change impl * fix bug
-
由 zyfncg 提交于
* remove remake densetensor * fix eager test error * fix bug in eager
-
由 YuanRisheng 提交于
-
由 Aurelius84 提交于
* Support allocate_from in Tensor and allocate_data in Context * fix #ifdef CUDA * fix cycle depends * fix test_xxx_dev_api failed * fix windows compiling error * fix unittest * modify into PImpl * fix selected rows * add TODO comment * refine interface according reviewer
-
由 Chen Weihang 提交于
* add infermeta registry * add infermeta registry * add unittest * polish details
-
由 Qi Li 提交于
-
由 Aganlengzi 提交于
* [Demo] custom kernel based on pten kernel * merge and npu custom work well * del comments * delete other code * fix CUDAContext * fix not found small_vector.h * support NPU * fix NPUContext * fix DeviceContext support * add UT * fix call * add UT * fix * fix for comments and ut * add MACRO control * fix multi input output * support env CUSTOM_DEVICE_ROOT * deal with special cases * fix for Windows * try coverage with test_custom_kernel_dot.py * fix test_custom_kernel_dot * fix test_custom_kernel_dot * fix merge * fix merge * fix CI * update * merge and fix * remove WITH_CUSTOM_KERNEL * fix merge * merge and fix * fix ut * fix ut for mac * add more UT * add more UT * fix
-
由 zhouweiwei2014 提交于
-
由 joanna.wozna.intel 提交于
* Upadate pass in quant2_int8_mkldnn_pass * Back to the previous scale_matmul order * Change place of cpu_quantize_placement_pass
-
由 chentianyu03 提交于
* add full_kernel xpu * fix full xpu register device type error * fix full kernel bug * add fulllike kernel impl and replace with raw kernel * fix dev_ctx convert template args error * modify namespace and header file * add isinf check * fix input type args in TensorSetConstantXPU error
-
由 wenbin 提交于
* shuffle channel pass * add ut * timeout fix * makefile fix
-
由 caozhou 提交于
* update planner * update unitest * update dist matmul * update auto converter
-
由 QingshuChen 提交于
* optimize kunlun/xpu softmax_with_cross_entropy add add unitest *test=kunlun * minor *test=kunlun * minor *test=kunlun * minor *test=kunlun * minor *test=kunlun
-