1. 09 2月, 2022 10 次提交
  2. 08 2月, 2022 7 次提交
  3. 07 2月, 2022 2 次提交
  4. 06 2月, 2022 1 次提交
  5. 04 2月, 2022 1 次提交
  6. 02 2月, 2022 1 次提交
  7. 30 1月, 2022 1 次提交
  8. 29 1月, 2022 2 次提交
    • L
      Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09
      Li Min 提交于
      * Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.
      
      * Remove useless code.
      
      * Remove useless code.
      
      * Optimize layer_norm fwd when cols is 1024.
      
      * Remove useless code.
      
      * Minors.
      
      * Minors.
      
      * Modifications accordding to reviews.
      
      * Minors.
      
      * Optimize layer_norm bwd kernel when cols is 1024.
      
      * Polish layer_norm_bwd_1024 kernel.
      
      * Limit ln_bwd_1024_kernel to paddle_with_cuda.
      
      * Fix double type compile error.
      
      * Add optimization of ln bwd for fused_dropout_add_ln op.
      
      * Polish codes.
      99cfcc09
    • C
      [PTen] Tidy pten core headers (#39188) · dd990981
      Chen Weihang 提交于
      * open header for custom kernel
      
      * add core utils
      
      * tidy core code
      
      * tify header
      
      * tidy include
      
      * tidy namespace
      
      * resolve conflit
      
      * fix unittest and coverage
      
      * remove platform using
      
      * resolve conflict
      
      * resolve conflict
      
      * fix digamma namespace error
      
      * fix xpu full kernel error
      
      * fix xpu full kernel error
      
      * polish details
      
      * add place for lib storage
      dd990981
  9. 28 1月, 2022 3 次提交
  10. 27 1月, 2022 9 次提交
    • S
      Add Khop Graph Sampler API (#39146) · 35f949b5
      Siming Dai 提交于
      * add the test case for the UVA
      
      * add the context load for the uva
      
      * Add graph_sample kernel
      
      * Add graph_sample commit
      
      * add new commit for graph_sample
      
      * add unsigned long long int
      
      * delete some remarks
      
      * add cpu version
      
      * add cuda eids
      
      * add cpu eids
      
      * delete _uva
      
      * optimize speed: emplace_back, last_layer
      
      * add to_uva_tensor
      
      * add cpu return_eids choice
      
      * add gpu return_eids choice
      
      * add cpu reindex_nodes
      
      * add gpu reindex_nodes
      
      * rename op and add OMP for cpu
      
      * add incubate api
      
      * fix the compile problem for the PADDLE_ENFORE and different device
      
      * fix the rcom and windows compile problem
      
      * add unittest for graph_sample_neighbors
      
      * fix cpu unittest and unique problem
      
      * fix uva unittest, fix cuda unique problem
      
      * fix the windows compile problem
      
      * fix the windows rand_r compile problem
      
      * add correct unittest, add src_eids dispensable
      
      * delete black
      
      * combine uva unittest
      
      * mv Sample_index to Sample_Index; check input shape; fix random sample func
      
      * delete memset & cudaMemset
      
      * fix according to PR comments
      
      * fix rocm ci
      
      * modify function names according to the specification
      
      * fix windows_openblas ci
      
      * refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors
      
      * fix rocm ci
      
      * rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc
      
      * add data type
      
      * fix conflict
      Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
      35f949b5
    • L
      [pten] remove concat fluid kernel (#39268) · 552db8dc
      Leo Chen 提交于
      552db8dc
    • Z
      【PTen】Remove ReMakePtenDenseTensor (#39094) · 98c1829b
      zyfncg 提交于
      * remove remake densetensor
      
      * fix eager test error
      
      * fix bug in eager
      98c1829b
    • Y
      refactor elementwise sub grad (#39225) · 7a1e1193
      YuanRisheng 提交于
      7a1e1193
    • Q
      [MLU] add compile ci scripts for MLU, test=mlu_ci (#39122) · 56410b4a
      Qi Li 提交于
      56410b4a
    • C
      [pten] add full xpu kernel (#39172) · 93839717
      chentianyu03 提交于
      * add full_kernel xpu
      
      * fix full xpu register device type error
      
      * fix full kernel bug
      
      * add fulllike kernel impl and replace with raw kernel
      
      * fix dev_ctx convert template args error
      
      * modify namespace and header file
      
      * add isinf check
      
      * fix input type args in TensorSetConstantXPU error
      93839717
    • Q
      optimize kunlun/xpu softmax_with_cross_entropy add add unitest (#39180) · 2b9bb8bb
      QingshuChen 提交于
      * optimize kunlun/xpu softmax_with_cross_entropy add add unitest
      *test=kunlun
      
      * minor
      *test=kunlun
      
      * minor
      *test=kunlun
      
      * minor
      *test=kunlun
      
      * minor
      *test=kunlun
      2b9bb8bb
    • Z
      Fix slice error in jit.to_static mode (#39251) · c0f993f6
      zyfncg 提交于
      * fix slice bug
      
      * fix syntax error
      c0f993f6
    • F
      move math_cuda_utils.h to pten/kernels/funcs (#39246) · 809a10b6
      Feiyu Chan 提交于
      809a10b6
  11. 26 1月, 2022 3 次提交
    • L
      [pten] remove deprecated fluid op kernel for pten (#38842) · 3ab9aef1
      Leo Chen 提交于
      * update cmake file to remove fluid kernel
      
      * add pten declaration.h to where pybind.h used
      
      * fix sync_bn and tensorrt_engine
      
      * refine detection_library
      
      * fix interpreter_core
      
      * support eager legacy
      
      * fit eager legacy for pten
      
      * fall back to cpu if not found kernel
      
      * fix compile problem
      
      * fix compile problem
      
      * refine fallback logic
      
      * fit operator.run()
      
      * fix xpu compile
      
      * fit for new_exec
      
      * add REGISTER_OP_WITHOUT_GRADIENT
      
      * un-cache pt_kernel_context
      
      * fix compile
      
      * fix cudnn
      
      * fix compiling with on_infer
      
      * fix mkldnn
      
      * fix isfinite_v2
      
      * fix xpu problem
      
      * fix op_device
      
      * refine fallback for xpu
      
      * fix xpu compile
      
      * merge develop
      
      * refine code format
      
      * fix compile
      
      * fix compile
      
      * add data_transfer
      
      * fix PreparePtenData
      
      * fix cpu context
      
      * merge develop
      
      * fix compile
      
      * fix error device context
      
      * fix xpu
      
      * fix dev_ctx
      3ab9aef1
    • C
      [pten] Cast xpu kernel (#39179) · 93d2f0a6
      chentianyu03 提交于
      * cast xpu kernel init
      
      * cast xpu kernel
      
      * replace with raw cast xpu kernel
      
      * fix cast kernel bug
      
      * add the missing break
      
      * modify namespace and header file
      93d2f0a6
    • Q
      [MLU]Add conv2d op (#39110) · 71634a61
      qipengh 提交于
      * [MLU]Add conv2d op
      
      * [MLU]fix comment
      
      * [MLU]adapt NCHW of conv2d op
      71634a61