1. 30 1月, 2022 7 次提交
  2. 29 1月, 2022 13 次提交
    • R
      fix paddle.where broadcast bug (#39182) · 92253f11
      ronnywang 提交于
      92253f11
    • L
      Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09
      Li Min 提交于
      * Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.
      
      * Remove useless code.
      
      * Remove useless code.
      
      * Optimize layer_norm fwd when cols is 1024.
      
      * Remove useless code.
      
      * Minors.
      
      * Minors.
      
      * Modifications accordding to reviews.
      
      * Minors.
      
      * Optimize layer_norm bwd kernel when cols is 1024.
      
      * Polish layer_norm_bwd_1024 kernel.
      
      * Limit ln_bwd_1024_kernel to paddle_with_cuda.
      
      * Fix double type compile error.
      
      * Add optimization of ln bwd for fused_dropout_add_ln op.
      
      * Polish codes.
      99cfcc09
    • L
      Add xpu2 compiler (#37254) · 92da5055
      Liu-xiandong 提交于
      * Add XPU compiler for paddle, test=develop
      
      * clean code
      
      * clean useless code
      
      * clean useless code
      
      * clean useless code
      
      * test
      
      * add include path
      
      * use clang compiler
      
      * xpu2.cmake
      
      * XPU2 compiler passed
      
      * update
      
      * update after pten
      
      * combination the WITH_XPU and WITH_XPU2
      
      * update the fuse operation in WITH_XPU and WITH_XPU2
      
      * update
      
      * update
      
      * update
      
      * fix the merge error
      
      * update
      
      * update the code
      
      * update the code
      
      * add run_kp_kernel flag
      
      * update
      
      * update
      
      * fix prepared type_ bug
      
      * clean and update the code
      
      * reset the kernel_primitives
      
      * update
      
      * clean the code
      
      * delete useless comment
      
      * fix the bug in WITH_XPU
      
      * update
      
      * update
      
      * modify the abi
      
      * delete some useless code
      
      * Parameter automation in xpu compilation
      
      * Parameter automation in xpu compilation
      
      * delete kps in cmake
      
      * delete useless comment
      
      * clean the code
      
      * clean the code
      92da5055
    • C
      rename utils to manual (#39320) · 96bcf2df
      Chen Weihang 提交于
      96bcf2df
    • C
      [PTen] Tidy pten core headers (#39188) · dd990981
      Chen Weihang 提交于
      * open header for custom kernel
      
      * add core utils
      
      * tidy core code
      
      * tify header
      
      * tidy include
      
      * tidy namespace
      
      * resolve conflit
      
      * fix unittest and coverage
      
      * remove platform using
      
      * resolve conflict
      
      * resolve conflict
      
      * fix digamma namespace error
      
      * fix xpu full kernel error
      
      * fix xpu full kernel error
      
      * polish details
      
      * add place for lib storage
      dd990981
    • T
      Symbolic Hessian (#39221) · 64e7c715
      Tongxin Bai 提交于
      * [autograd] static Jacobian pass tests.
      
      * [autograd] apply CR suggested changes.
      
      * [autograd] more tests.
      
      * [autograd] add CPUPlace in tests.
      
      * [autograd] bug fixes.
      
      * [autograd] reformatted.
      
      * [autograd] adding Hessian, in progress.
      
      * [autograd] Hessian passes. A double grad bug fixed.
      
      * [autograd] fix renaming conflict in double backward pass.
      
      * [autograd] polish test.s
      
      * fix a bug when using brackets
      
      * debug for ci
      
      * [autograd] fixing Hessian test.
      
      * polish format.
      Co-authored-by: Nlevi131 <83750468+levi131@users.noreply.github.com>
      Co-authored-by: Nlevi131 <limaolin01@baidu.com>
      64e7c715
    • Z
    • G
      fix FakeQuantAbsMax in QAT (#39307) · 34d97c57
      Guanghua Yu 提交于
      34d97c57
    • J
      Auto parallel/qkv fuse (#39080) · fdedf909
      JZ-LIANG 提交于
      * support qkv fuse
      
      * support qkv fuse
      
      * update completion
      
      * update completion
      
      * update dist_split
      
      * rerun ci
      
      * is_auto_compatible added
      
      * is_auto_compatible added
      fdedf909
    • Q
      fix kunlun2 softmax unitest bug (#39274) · 23bb2836
      QingshuChen 提交于
      * fix kunlun2 softmax unitest bug
      *test=kunlun
      
      * minor
      23bb2836
    • J
    • H
      Add FuseReluDepthwiseConvPass and unittest (#39105) · 9d6e8202
      hlygit66666 提交于
      * add fuse_relu_depthwise_conv_pass unittest
      
      * fix atol and rtol
      
      * fix according to review
      
      * Update test_dist_fuse_relu_depthwise_conv_pass.py
      9d6e8202
    • L
      7b4916c4
  3. 28 1月, 2022 18 次提交
  4. 27 1月, 2022 2 次提交
    • Z
      implement AllocateFrom (#39280) · d89f246c
      zhangkaihuo 提交于
      d89f246c
    • S
      Add Khop Graph Sampler API (#39146) · 35f949b5
      Siming Dai 提交于
      * add the test case for the UVA
      
      * add the context load for the uva
      
      * Add graph_sample kernel
      
      * Add graph_sample commit
      
      * add new commit for graph_sample
      
      * add unsigned long long int
      
      * delete some remarks
      
      * add cpu version
      
      * add cuda eids
      
      * add cpu eids
      
      * delete _uva
      
      * optimize speed: emplace_back, last_layer
      
      * add to_uva_tensor
      
      * add cpu return_eids choice
      
      * add gpu return_eids choice
      
      * add cpu reindex_nodes
      
      * add gpu reindex_nodes
      
      * rename op and add OMP for cpu
      
      * add incubate api
      
      * fix the compile problem for the PADDLE_ENFORE and different device
      
      * fix the rcom and windows compile problem
      
      * add unittest for graph_sample_neighbors
      
      * fix cpu unittest and unique problem
      
      * fix uva unittest, fix cuda unique problem
      
      * fix the windows compile problem
      
      * fix the windows rand_r compile problem
      
      * add correct unittest, add src_eids dispensable
      
      * delete black
      
      * combine uva unittest
      
      * mv Sample_index to Sample_Index; check input shape; fix random sample func
      
      * delete memset & cudaMemset
      
      * fix according to PR comments
      
      * fix rocm ci
      
      * modify function names according to the specification
      
      * fix windows_openblas ci
      
      * refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors
      
      * fix rocm ci
      
      * rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc
      
      * add data type
      
      * fix conflict
      Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
      35f949b5