- 18 4月, 2022 1 次提交
-
-
由 TeFeng Chen 提交于
cinn_launch_op: optimize the overhead of preparing variables before executing cinn compiled program (#41777) * optimize preparation overhead before executing cinn compiled program * update code notes * fix flag annotation * add a flag of auto-tune feature beforehand
-
- 17 4月, 2022 2 次提交
-
-
由 Fan Zhang 提交于
* Adapt XPUPS - 1st version - 3.24 * Adapt XPUPS - update XPU PushSparse - 2nd version - 3.24 * Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25 * refactor heter comm kernel * update. test=develop * Adapt XPUPS - modify by compilation - 4th version - 3.27 * update calc_shard_offset. test=develop * update xpu kernel. test=develop * update args of calc_shard_offset * update. test=develop * remove customGradMerger * update. test=develop * heter_comm update * heter_comm update * update calc_shard_offset. test=develop * heter_comm update * update args of calc_shard_offset * update. test=develop * remove customGradMerger * update. test=develop * fix. test=develop * update. test=develop * update. test=develop * update optimizer kernel * Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30 * update. test=develop * update pslib.cmake * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * Adapt XPUPS - modify by kp compilation - 6th version - 3.30 * update. test=develop * update. test=develop * update. test=develop * update optimizer kernel * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * fix. test=develop * fix. test=develop * used by minxu * update heter_comm_inl * fix. test=develop * Adapt XPUPS - modify by kp compilation - 7th version - 3.30 * fix. test=develop * add optimizer kernel. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * 3.31 update * Adapt XPUPS - update kp compilation path - 8th version - 3.31 * add optimizer kernel. test=develop * fix kunlun not support size_t. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix kunlun not support size_t. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * update heter_comm_kernel.kps 3.31 * fix. test=develop * fix. test=develop * update heter_comm_kernel.kps 3.31 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * update heter_comm.h 3.31 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * update hashtable. test=develop * update. test=develop * Adapt XPUPS - update by kp compilation - 9th version - 4.1 * update hashtable. test=develop * fix. test=develop * update hashtable 4.1 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * Adapt XPUPS - update by kp compilation - 10th version - 4.1 * fix. test=develop * fix. test=develop * fix. test=develop * update. test=develop * modify by compilation 4.1 * update. test=develop * update. test=develop * fix. test=develop * modify by compilation 4.1 * update. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * modify by compilation 4.1 * fix. test=develop * fix. test=develop * fix. test=develop * modify by compilation 4.1 19:30 * fix. test=develop * update ps_gpu_wrapper.kps 4.1 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * Adapt XPUPS - update by kp compilation - 11th version - 4.1 * fix. test=develop * Adapt XPUPS - update by kp compilation - 12nd version - 4.2 * fix. test=develop * fix. test=develop * modify by compilation 4.2 * 4.2 update * fix. test=develop * template init. test=develop * update 4.6 * fix. test=develop * template init. test=develop * 4.6 modify by compilation * hashtable template init. test=develop * hashtable template init. test=develop * fix. test=develop * fix. test=develop * fix. test=devlop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=devlop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * Adapt XPUPS - update by kp compilation - 13nd version - 4.7 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * 4.11 update * fix. test=develop * fix. test=develop * 4.11 update * update by pre-commit * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * 4.12 update * fix. test=develop * Adapt XPUPS - update by kp compilation - 14th version - 4.13 * 4.13 update * 4.14 update * 4.14 update * 4.14 update * 4.14 modify by merged latest compilation * retry CI 4.14 * 4.15 pass static check * 4.15 modify by gpups CI * 3.16 update by gpups CI - modify ps_gpu_wrapper.h * 4.16 update * 4.16 pass xpu compile * 4.16 retry CI * 4.16 update Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>
-
由 Chen Weihang 提交于
* split phi and fluid infermeta context * resolve conflict * fix type error * optimize scheduling perf * spec small vector size * replace all grad var name * fix test failed * move init defalut signature * polish details * polish details * fix no init bug * init sig for tests * add init sig for infer * fix infrt error * fix infrt failed * fix kunlun error * fix infrt failed
-
- 15 4月, 2022 6 次提交
-
-
由 ziyoujiyi 提交于
* back fl * delete ssl cert * . * make warning * . * unittest paral degree * solve unittest * heter & multi cloud commm ready * . * . * arm_brpc compile * . * . * . * . * . * . * . * . * . * . * . * . * . * . * only output is ok * base is ok * . * . * . * . * . * . * . * . * add switch server bin * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * adapt brpc ssl * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
-
由 seemingwang 提交于
* extract sub-graph * graph-engine merging * fix * fix * fix heter-ps config * test performance * test performance * test performance * test * test * update bfs * change cmake * test * test gpu speed * gpu_graph_engine optimization * add ssd layer to graph_engine * fix allocation * fix syntax error * fix syntax error * fix pscore class * fix * recover test * recover test * fix spelling * recover * fix
-
由 chentianyu03 提交于
* split reduce_kernel * rm reduce_kernel in cmake * split reduce_grad kernels * fix cmake build error * format code * fix standalone_executor_test error
-
由 danleifeng 提交于
* add gpupsutil and afsclient; test=develop
-
由 zmxdream 提交于
* refactor heter comm kernel * update. test=develop * update calc_shard_offset. test=develop * update xpu kernel. test=develop * update args of calc_shard_offset * update. test=develop * remove customGradMerger * update. test=develop * update. test=develop * fix. test=develop * update. test=develop * update. test=develop * update optimizer kernel * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * fix. test=develop * fix. test=develop * add optimizer kernel. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix kunlun not support size_t. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * update hashtable. test=develop * update. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * update. test=develop * update. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * template init. test=develop * hashtable template init. test=develop * fix. test=develop * fix. test=devlop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix hashtable_kernel. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop Co-authored-by: NWorgenZhang <frank08081993@gmail.com>
-
由 Allen Guo 提交于
* add mixed-precission support for ipu * restore cast_model_to_fp16 api * update UTs
-
- 14 4月, 2022 6 次提交
-
-
由 Lijunhui 提交于
* regist elementwise_xxx
-
由 liutiexing 提交于
* executor perf statistics * fix ut * fix ut * fix ut * add ut * add ut
-
由 Jacek Czaja 提交于
* Add UT - Added missed data_layout - Added missing conversions - NDHWC added - NDHWC support in data_transform - another fix - condddate change - fix u- fix - fix - fix - fix - fix - fix to hack - compilation fix - fix to automatic merge * - reduced UT * - fix * - lint * - fix to lint
-
由 Sławomir Siwek 提交于
* Change tensor name to match activation * declare fc_eltwise_add pass * merge conv_eltwise refactor PR * first compilable draft * unittest feedback tools * Fuse pass tester * Move IsReachable() to shared file * 100% coverage of fuse_pass_tester.cc * register pass * Add bias node * Improve unit tests / remove bias node from pattern * improve fc_eltwiseadd_unittest * cancel eltwise_add fuse if act is already fused * Add elementwise_input scale * Residual MVP * Add new FC attrs * Add more test cases * Add missing op attrs * Adapt code to new Elementwise pattern * reuse existing fcpattern * improve code style * remove unused arguments * fix typo * remove whitespace * remove int8 related code * Remove attributes from base ops * style * style check * Remove input from base op * Set attribute during fuse * ut timeout * download and test model * DRY * apply feedback from review * Style check * fix typo * cosmetic changes * explicitly set residual as output * VIT-OCR accuracy check * trigger CI * remove whitespaces * fix missing data file
-
由 baoachun 提交于
* add mkldnn int8 pass [step3] * Add test for compute_propagate_scales_mkldnn_pass * update pass * update api comment and python api Co-authored-by: Nwozna <joanna.wozna@intel.com>
-
由 jakpiase 提交于
* added shuffle_channel bf16/fp32 fwd kernel * added missing files * CI fix * changed from pten to phi * tmp save * added reviewers suggestions * fix for test
-
- 13 4月, 2022 5 次提交
-
-
由 wangguanqun 提交于
* the one ps proto * the one ps proto * fix * fix * fix * fix windows ci * fix windows ci * add dependency * add dependency
-
由 zmxdream 提交于
[XPUPS]add support for kunlun2 Co-authored-by: NWorgenZhang <frank08081993@gmail.com>
-
由 zyfncg 提交于
* remove stack_grad infershape * fix bug of output with null * fix bug
-
由 Thunderbrook 提交于
* optimize hbm * format * format
-
由 Chen Weihang 提交于
* remove old custom op placetype * replace dist placetype using * add with gpu macro * fix mutable_data error * fix set value error * add comment
-
- 12 4月, 2022 3 次提交
-
-
由 danleifeng 提交于
* perform SlotRecordInMemoryDataFeed feedvec;test=develop
-
由 Leo Chen 提交于
-
由 Chen Weihang 提交于
* add new method for custom double grad * add tanh double grad unittest * change year * revert tensor init method
-
- 10 4月, 2022 3 次提交
-
-
由 Liu-xiandong 提交于
* [KP]fix bug when TruncatedNormal cannot fall back in cpu * delete useless comment * delete useless comment
-
由 baoachun 提交于
-
由 baoachun 提交于
* add mkldnn int8 pass * add mkldnn int8 pass * update pass
-
- 09 4月, 2022 3 次提交
-
-
由 zhaocaibei123 提交于
* update name * update name * fix test * fix fleet bind * update name * update name * fix test * fix gpups wrapper * remove Push/Pull/Load/Save with context in client and wrapper base class * fix * fix * remove some interface * fix * remove * code style * recover * fix * remove code unused * remove some unused table & accessor & CommonDenseTable => MemoryDenseTable * fix * fix * fix * recover * remove unused code * recover unittest * fix * remove * fix * remove code unuseful * remove * fix * recover * remove Co-authored-by: Nesythan <esythan@126.com>
-
由 limingshu 提交于
* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode. * Use the system cudaMalloc and cudaFree to allocate workspace during searching. * Enable switch of two kind of workspace setting methods. Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
由 Leo Chen 提交于
* fix bug that no thread is waked up when adding task to threadpool * fix typo
-
- 07 4月, 2022 3 次提交
-
-
由 Thunderbrook 提交于
* afs wrapper * format * format * macro
-
由 liutiexing 提交于
* Profile Executors * update * fix ut * fix names * update * update
-
由 houj04 提交于
* momentum support l2decay for xpu. test=kunlun * fix include file. test=kunlun * fix cmake for device_worker. test=kunlun
-
- 06 4月, 2022 1 次提交
-
-
由 Allen Guo 提交于
* remove paddle_ipu shared library * fix unique_name
-
- 05 4月, 2022 2 次提交
-
-
由 zyfncg 提交于
* fix bug of data transform in inference executor * fix bug
-
由 Leo Chen 提交于
* enable new executor by default * enable stream safe allocator * test=document_fix;test=coverage * do not use scope in op kernel * fit empty program for new executor * fix communication depend * fix test_sync_batch_norm * skip unsupported place * refine datatransfer * fit for dirtributed program * fix dependencpy * fix some ut
-
- 04 4月, 2022 2 次提交
-
-
由 Sławomir Siwek 提交于
* DRY * change nodes names * add const prefix * change asX to as_x in all files
-
由 hong 提交于
* add dropout slice yaml * remove useless code * fix infer shape error * skip infrt compile for dropout
-
- 03 4月, 2022 1 次提交
-
-
由 chentianyu03 提交于
* add concat_grad kernel * fix error * remove comment code * fix outs nullptr error * change to phi header * add concat_grad declare for standalone_executor_test
-
- 02 4月, 2022 2 次提交