1. 20 12月, 2022 1 次提交
  2. 28 11月, 2022 1 次提交
    • Z
      Cherrypick NV fixes to release/2.4 (#48263) · 7a0b8625
      zlsh80826 提交于
      * Reduce squeeze2_matmul_fuse_pass, flattent tests time (#47098)
      
      * Add missing fp32 config and reduce the testing combination
      
      * Reduce trt matmul pass test max examples
      
      * Loose TRT fp16 tests tolerance (#47100)
      
      * Loose TRT half test tolerance to 1e-3 (#47101)
      
      * Loose TRT half test tolerance to 1e-3 (#47106)
      
      * Update distributed_strategy.proto (#46531)
      
      * Close popen pipe after used (#47053)
      
      * Add launch_bounds (#47285)
      
      * Fix TRT UT failures (#47488)
      
      * Format cherry-picked commits
      
      * CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs (#48203)
      
      * Skip tests that use fused_ops on H100
      
      * Add error message to FusedOps on H100
      Co-authored-by: NShijie <505749828@qq.com>
      Co-authored-by: NLeo Chen <39020268+leo0519@users.noreply.github.com>
      Co-authored-by: NTian Zheng <tizheng@nvidia.com>
      7a0b8625
  3. 26 10月, 2022 1 次提交
  4. 20 10月, 2022 1 次提交
  5. 18 10月, 2022 1 次提交
  6. 19 9月, 2022 2 次提交
  7. 09 9月, 2022 1 次提交
  8. 08 9月, 2022 2 次提交
  9. 07 9月, 2022 1 次提交
  10. 01 9月, 2022 1 次提交
  11. 31 8月, 2022 1 次提交
  12. 23 8月, 2022 1 次提交
  13. 17 8月, 2022 1 次提交
  14. 16 8月, 2022 1 次提交
    • F
      convert multihead to oss (#45019) · f706d95d
      feng_shuai 提交于
      * convert multihead to oss
      
      * fix:bug
      
      * fix:delete const cast
      
      * fix:don't support bias_qk
      
      * add vit pass
      
      * fix:convert bug and add preln_residual_bias
      
      * support length=-1
      
      * add UT for convert
      
      * add no_bias_qk support for gpu_multihead_op
      
      * delete infer_shape depends on bias_qk
      
      * oss just can be used in T4 and A*
      
      * fix:change api for ROCM CI
      f706d95d
  15. 15 8月, 2022 2 次提交
  16. 09 8月, 2022 1 次提交
  17. 05 8月, 2022 1 次提交
  18. 02 8月, 2022 1 次提交
  19. 01 8月, 2022 1 次提交
    • L
      unify gpu context (#44740) · 86763023
      Leo Chen 提交于
      * remove cudaDeviceContext
      
      * remove more template
      
      * fix rocm compile
      
      * remove alias name CUDADeviceContext
      
      * fix compile
      
      * fix tests
      
      * revert changes
      86763023
  20. 29 7月, 2022 3 次提交
  21. 26 7月, 2022 1 次提交
  22. 19 7月, 2022 2 次提交
  23. 18 7月, 2022 1 次提交
  24. 13 7月, 2022 1 次提交
  25. 12 7月, 2022 1 次提交
  26. 08 7月, 2022 2 次提交
  27. 07 7月, 2022 2 次提交
  28. 06 7月, 2022 2 次提交
  29. 02 7月, 2022 1 次提交
    • L
      unify cpu context, part2 (#44012) · 755438a7
      Leo Chen 提交于
      * fix init()
      
      * delete test_device_context
      
      * replace CPUDeviceContext with CPUContext
      
      * fix test_scalar
      
      * remove dot_op.cc
      
      * fix compile
      755438a7
  30. 01 7月, 2022 1 次提交
    • L
      Addition of switch_auto_tune option for transpose op (#43310) · 53d5abe3
      limingshu 提交于
      * 2nd part of transpose update
      
      * add switch_auto_tune option.
      
      * add some changes according to Ci
      
      * refine the structure of auto_tune_base.
      
      * merge develop changes
      
      * reset the switch_set_range and change unittest of transpose auto-tune
      
      * change the kernel auto-tune logits
      53d5abe3
  31. 30 6月, 2022 1 次提交