• L
    Combination of multiple paddle::memory::allocate operation into one for ops (#49126) · bdae5481
    limingshu 提交于
    * A leap of try for cudaLaunchCooperativeKernel
    
    * fix bugs
    
    * Totally replace the lar cuda kernel
    
    * Fix bugs
    
    * fix code according to comments
    
    * fix codes according to  review comments
    
    * adding some function overload
    
    * relocate the power operation.
    
    * add bf16 support for index select relevant ops
    
    * revert bf16 type change.
    
    * add changes for more op
    
    * fix code writting bugs
    bdae5481
values_vectors_functor.h 21.7 KB