1. 09 2月, 2022 5 次提交
    • J
      Replace EagerTensor with Tensor (#39376) · 945a3ce9
      Jiabin Yang 提交于
      * merge legacy to fluid
      
      * Remove legacy code
      
      * Remove legacy code
      
      * Remove DataType test
      
      * Using Tensor directly instead of using EagerTensor
      
      * support gradient_accumulation
      
      * make test_imperative_lod_tensor_to_selected_rows longer
      
      * make test_imperative_lod_tensor_to_selected_rows longer
      945a3ce9
    • H
      Move trace op to pten (#39227) · d7dddf94
      hong 提交于
      * add trace op
      
      * bug fix
      
      * bug fix; test=develop
      
      * thrust bug fix; test=develop
      
      * remove useless register; test=develop
      
      * fix bug; test=develop
      
      * update trace kernel; test=develop
      
      * move kernel args to trace_sig; test=develop
      d7dddf94
    • C
      [CustomOp] Fix slice bug of custom op (#39393) · 91b074a2
      Chen Weihang 提交于
      * fix slice bug of cusstom op
      
      * add offset in check
      91b074a2
    • S
      eaa3fd45
    • H
      Move norm to pten (#39324) · ece200b3
      hong 提交于
      * add norm cpu
      
      * update code;
      
      * norm bug fix
      
      * move norm op to pten; test=develop
      
      * move norm op to pten; test=develop
      
      * add norm util; test=develop
      
      * fix norm npu bug; test=develop
      
      * fix norm kernel bug; test=develop
      
      * move kernel args to pten; test=develop
      
      * move kernel args to pten sig; test=develop
      ece200b3
  2. 08 2月, 2022 9 次提交
    • S
      Make Embedding layer support more int ids type (#39381) · 60f1461a
      sneaxiy 提交于
      * add more int id type support for embedding
      
      * add ut
      
      * add more ut
      
      * fix ci error
      60f1461a
    • H
      Add FuseOptimizerPass and test_dist_fuse_adam_pass unittest. (#39208) · ccdcfa2d
      hlygit66666 提交于
      * add fuse_relu_depthwise_conv_pass unittest
      
      * fix atol and rtol
      
      * fix according to review
      
      * Add FuseOptimizerPass and fuse_adam_pass unittest
      
      * add sgd and momentum unittest
      
      * add fuse_optimizer_pass
      
      * close amp
      
      * close amp
      
      * update
      
      * fix run on two cards
      
      * Update test_dist_fuse_adam_pass.py
      
      * Update test_dist_fuse_momentum_pass.py
      
      * Update test_dist_fuse_sgd_pass.py
      
      * Create test_dist_fuse_sgd_pass.py
      
      * Create test_dist_fuse_sgd_pass.py
      
      * Create test_dist_fuse_sgd_pass.py
      
      * Update test_dist_fuse_adam_pass.py
      
      * Update test_dist_fuse_momentum_pass.py
      
      * Update test_dist_fuse_sgd_pass.py
      ccdcfa2d
    • Z
      ps optimize refactor (#38982) · 196dbfc2
      ziyoujiyi 提交于
      * delete gloo connect retry
      
      * the_one_ps dirs reconstruct
      
      * .
      
      * .
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * refactor ps optimize
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * refactor theoneps
      
      * the_one_ps
      
      * add ps pass unittest
      
      * add ps pass unittest
      
      * ps unitest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * ps unittest frame
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * add cpu_async_ps_mode test
      
      * ps unittest ready
      
      * ps unittest ready
      
      * solve dist_pass init conflict
      
      * solve import CommContext error
      
      * unittest ok
      
      * implement AllocateFrom
      
      * solve setup.py.in conflict
      
      * solve conflict
      
      * solve conflict
      
      * solve conflict
      
      * .
      
      * .
      Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>
      196dbfc2
    • Z
      [bf16] add bf16 cuda kernel: concat and split (#39380) · de0bad2a
      zhangbo9674 提交于
      * add concat & split
      
      * add concat kernel
      
      * add concat unittest
      
      * add split unittest
      de0bad2a
    • W
      0fee0044
    • B
      optimize sharding stage3 (#39334) · 23d559dd
      Baibaifan 提交于
      23d559dd
    • C
      Fix reduce_sum dtype dispatch bug on gpu (#39349) · 4d7ad277
      Chen Weihang 提交于
      * fix pten reduce dispatch bug
      
      * add cast beforce reduce
      
      * fix test failed
      4d7ad277
    • L
      [bf16] support printing bf16 tensor (#39375) · f57b21e6
      Leo Chen 提交于
      f57b21e6
    • S
      Add __PD_DEFINE_RAW_OP_KERNEL_FUNC for registering custom op kernel with ExecutionContext (#39352) · 5c3873f6
      sneaxiy 提交于
      * hack custom op
      
      * add ut
      
      * skip windows ci
      5c3873f6
  3. 07 2月, 2022 5 次提交
  4. 30 1月, 2022 3 次提交
  5. 29 1月, 2022 3 次提交
  6. 28 1月, 2022 4 次提交
  7. 27 1月, 2022 11 次提交
    • S
      Add Khop Graph Sampler API (#39146) · 35f949b5
      Siming Dai 提交于
      * add the test case for the UVA
      
      * add the context load for the uva
      
      * Add graph_sample kernel
      
      * Add graph_sample commit
      
      * add new commit for graph_sample
      
      * add unsigned long long int
      
      * delete some remarks
      
      * add cpu version
      
      * add cuda eids
      
      * add cpu eids
      
      * delete _uva
      
      * optimize speed: emplace_back, last_layer
      
      * add to_uva_tensor
      
      * add cpu return_eids choice
      
      * add gpu return_eids choice
      
      * add cpu reindex_nodes
      
      * add gpu reindex_nodes
      
      * rename op and add OMP for cpu
      
      * add incubate api
      
      * fix the compile problem for the PADDLE_ENFORE and different device
      
      * fix the rcom and windows compile problem
      
      * add unittest for graph_sample_neighbors
      
      * fix cpu unittest and unique problem
      
      * fix uva unittest, fix cuda unique problem
      
      * fix the windows compile problem
      
      * fix the windows rand_r compile problem
      
      * add correct unittest, add src_eids dispensable
      
      * delete black
      
      * combine uva unittest
      
      * mv Sample_index to Sample_Index; check input shape; fix random sample func
      
      * delete memset & cudaMemset
      
      * fix according to PR comments
      
      * fix rocm ci
      
      * modify function names according to the specification
      
      * fix windows_openblas ci
      
      * refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors
      
      * fix rocm ci
      
      * rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc
      
      * add data type
      
      * fix conflict
      Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
      35f949b5
    • A
      [PTen]Support AllocateFrom in Tensor and Alloc/HostAlloc in Context (#39022) · 5631da9c
      Aurelius84 提交于
      * Support allocate_from in Tensor and allocate_data in Context
      
      * fix #ifdef CUDA
      
      * fix cycle depends
      
      * fix test_xxx_dev_api failed
      
      * fix windows compiling error
      
      * fix unittest
      
      * modify into PImpl
      
      * fix selected rows
      
      * add TODO comment
      
      * refine interface according reviewer
      5631da9c
    • A
      [PluggableDevice] Add custom kernel support based on pten kernel management (#38848) · a8879215
      Aganlengzi 提交于
      * [Demo] custom kernel based on pten kernel
      
      * merge and npu custom work well
      
      * del comments
      
      * delete other code
      
      * fix CUDAContext
      
      * fix not found small_vector.h
      
      * support NPU
      
      * fix NPUContext
      
      * fix DeviceContext support
      
      * add UT
      
      * fix call
      
      * add UT
      
      * fix
      
      * fix for comments and ut
      
      * add MACRO control
      
      * fix multi input output
      
      * support env CUSTOM_DEVICE_ROOT
      
      * deal with special cases
      
      * fix for Windows
      
      * try coverage with test_custom_kernel_dot.py
      
      * fix test_custom_kernel_dot
      
      * fix test_custom_kernel_dot
      
      * fix merge
      
      * fix merge
      
      * fix CI
      
      * update
      
      * merge and fix
      
      * remove WITH_CUSTOM_KERNEL
      
      * fix merge
      
      * merge and fix
      
      * fix ut
      
      * fix ut for mac
      
      * add more UT
      
      * add more UT
      
      * fix
      a8879215
    • zhouweiwei2014's avatar
      fix UT test_lr_scheduler random fail (#39254) · 7e6a2190
      zhouweiwei2014 提交于
      7e6a2190
    • J
      Update passes in quant2_int8_mkldnn_pass (#38912) · 0e235e58
      joanna.wozna.intel 提交于
      * Upadate pass in quant2_int8_mkldnn_pass
      
      * Back to the previous scale_matmul order
      
      * Change place of cpu_quantize_placement_pass
      0e235e58
    • W
      fix shuffle_channel_detect_pass (#39242) · af9ddeb7
      wenbin 提交于
      * shuffle channel pass
      
      * add ut
      
      * timeout fix
      
      * makefile fix
      af9ddeb7
    • C
      【Auto Parallel】Update Planner (#39201) · f2226441
      caozhou 提交于
      * update planner
      
      * update unitest
      
      * update dist matmul
      
      * update auto converter
      f2226441
    • Q
      optimize kunlun/xpu softmax_with_cross_entropy add add unitest (#39180) · 2b9bb8bb
      QingshuChen 提交于
      * optimize kunlun/xpu softmax_with_cross_entropy add add unitest
      *test=kunlun
      
      * minor
      *test=kunlun
      
      * minor
      *test=kunlun
      
      * minor
      *test=kunlun
      
      * minor
      *test=kunlun
      2b9bb8bb
    • C
      【Auto Parallel】update dist param grad for pass (#38941) · cac6f408
      caozhou 提交于
      * update dist param grad for pass
      
      * update unitest
      
      * update unitests
      
      * fix conflict
      cac6f408
    • W
      [Paddle-Inference]: fix concat slice (#39096) · f080e8d5
      Wangzheee 提交于
      * Paddle-Inference:fix_concat_slice
      
      * Paddle-Inference:fix_concat_slice
      
      * Paddle-Inference:fix_concat_slice
      
      * Paddle-Inference:fix_concat_slice
      
      * [Paddle-Inference]: fix concat slice
      
      * [Paddle-Inference]: fix concat slice
      
      * [Paddle-Inference]: fix concat slice
      f080e8d5
    • H
      Take/Put_along_axis more input size support (#39072) · 41a64351
      huangxu96 提交于
      Support the cases that the indices shape size is larger than the arr shape size
      41a64351