1. 02 2月, 2022 1 次提交
  2. 30 1月, 2022 1 次提交
  3. 29 1月, 2022 2 次提交
    • L
      Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09
      Li Min 提交于
      * Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.
      
      * Remove useless code.
      
      * Remove useless code.
      
      * Optimize layer_norm fwd when cols is 1024.
      
      * Remove useless code.
      
      * Minors.
      
      * Minors.
      
      * Modifications accordding to reviews.
      
      * Minors.
      
      * Optimize layer_norm bwd kernel when cols is 1024.
      
      * Polish layer_norm_bwd_1024 kernel.
      
      * Limit ln_bwd_1024_kernel to paddle_with_cuda.
      
      * Fix double type compile error.
      
      * Add optimization of ln bwd for fused_dropout_add_ln op.
      
      * Polish codes.
      99cfcc09
    • C
      [PTen] Tidy pten core headers (#39188) · dd990981
      Chen Weihang 提交于
      * open header for custom kernel
      
      * add core utils
      
      * tidy core code
      
      * tify header
      
      * tidy include
      
      * tidy namespace
      
      * resolve conflit
      
      * fix unittest and coverage
      
      * remove platform using
      
      * resolve conflict
      
      * resolve conflict
      
      * fix digamma namespace error
      
      * fix xpu full kernel error
      
      * fix xpu full kernel error
      
      * polish details
      
      * add place for lib storage
      dd990981
  4. 28 1月, 2022 3 次提交
  5. 27 1月, 2022 9 次提交
    • S
      Add Khop Graph Sampler API (#39146) · 35f949b5
      Siming Dai 提交于
      * add the test case for the UVA
      
      * add the context load for the uva
      
      * Add graph_sample kernel
      
      * Add graph_sample commit
      
      * add new commit for graph_sample
      
      * add unsigned long long int
      
      * delete some remarks
      
      * add cpu version
      
      * add cuda eids
      
      * add cpu eids
      
      * delete _uva
      
      * optimize speed: emplace_back, last_layer
      
      * add to_uva_tensor
      
      * add cpu return_eids choice
      
      * add gpu return_eids choice
      
      * add cpu reindex_nodes
      
      * add gpu reindex_nodes
      
      * rename op and add OMP for cpu
      
      * add incubate api
      
      * fix the compile problem for the PADDLE_ENFORE and different device
      
      * fix the rcom and windows compile problem
      
      * add unittest for graph_sample_neighbors
      
      * fix cpu unittest and unique problem
      
      * fix uva unittest, fix cuda unique problem
      
      * fix the windows compile problem
      
      * fix the windows rand_r compile problem
      
      * add correct unittest, add src_eids dispensable
      
      * delete black
      
      * combine uva unittest
      
      * mv Sample_index to Sample_Index; check input shape; fix random sample func
      
      * delete memset & cudaMemset
      
      * fix according to PR comments
      
      * fix rocm ci
      
      * modify function names according to the specification
      
      * fix windows_openblas ci
      
      * refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors
      
      * fix rocm ci
      
      * rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc
      
      * add data type
      
      * fix conflict
      Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
      35f949b5
    • L
      [pten] remove concat fluid kernel (#39268) · 552db8dc
      Leo Chen 提交于
      552db8dc
    • Z
      【PTen】Remove ReMakePtenDenseTensor (#39094) · 98c1829b
      zyfncg 提交于
      * remove remake densetensor
      
      * fix eager test error
      
      * fix bug in eager
      98c1829b
    • Y
      refactor elementwise sub grad (#39225) · 7a1e1193
      YuanRisheng 提交于
      7a1e1193
    • Q
      [MLU] add compile ci scripts for MLU, test=mlu_ci (#39122) · 56410b4a
      Qi Li 提交于
      56410b4a
    • C
      [pten] add full xpu kernel (#39172) · 93839717
      chentianyu03 提交于
      * add full_kernel xpu
      
      * fix full xpu register device type error
      
      * fix full kernel bug
      
      * add fulllike kernel impl and replace with raw kernel
      
      * fix dev_ctx convert template args error
      
      * modify namespace and header file
      
      * add isinf check
      
      * fix input type args in TensorSetConstantXPU error
      93839717
    • Q
      optimize kunlun/xpu softmax_with_cross_entropy add add unitest (#39180) · 2b9bb8bb
      QingshuChen 提交于
      * optimize kunlun/xpu softmax_with_cross_entropy add add unitest
      *test=kunlun
      
      * minor
      *test=kunlun
      
      * minor
      *test=kunlun
      
      * minor
      *test=kunlun
      
      * minor
      *test=kunlun
      2b9bb8bb
    • Z
      Fix slice error in jit.to_static mode (#39251) · c0f993f6
      zyfncg 提交于
      * fix slice bug
      
      * fix syntax error
      c0f993f6
    • F
      move math_cuda_utils.h to pten/kernels/funcs (#39246) · 809a10b6
      Feiyu Chan 提交于
      809a10b6
  6. 26 1月, 2022 9 次提交
  7. 25 1月, 2022 12 次提交
  8. 24 1月, 2022 3 次提交