1. 18 7月, 2023 4 次提交
  2. 15 7月, 2023 2 次提交
  3. 14 7月, 2023 6 次提交
  4. 12 7月, 2023 1 次提交
    • M
      Reduce Unit Test Times (Part 3) (#3850) · aef6c65c
      Michael Wyatt 提交于
      * add coverage report
      
      * define env vars in shared action
      
      * reduce time for longest running tests
      
      * fix broken shared action
      
      * reduce test time
      
      * reducing Pipeline test times
      
      * further reducing test times
      
      * rework Z3 test
      
      * testing new mp.pool and persistent dist envs
      
      * fix import
      
      * reuse distributed environment for tests with lots of param combos
      
      * fix for dist teardown
      
      * fix pickling issue with pool cache
      
      * actually fix pickling problem
      
      * avoid running pool cache stuff on non-distributed tests
      
      * fix issues with nested mp.pool
      
      * fix for nested pools in Pipeline Engine
      
      * re-add params
      
      * update workflows with pytest opts
      
      * implement feedback
      
      * resolve race condition with port selection
      
      * Update tests/unit/common.py
      
      ---------
      Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
      aef6c65c
  5. 07 7月, 2023 1 次提交
  6. 06 7月, 2023 2 次提交
  7. 30 6月, 2023 2 次提交
  8. 29 6月, 2023 1 次提交
  9. 27 6月, 2023 2 次提交
  10. 24 6月, 2023 2 次提交
  11. 22 6月, 2023 1 次提交
  12. 14 6月, 2023 1 次提交
    • L
      Fix apex install bugs (#3741) · 1b401823
      Logan Adams 提交于
      * Fix apex installation
      
      * Switch install flag from build-opt to global-opt to fix missing cpp_ext
      
      * Try installing with support for newer pip
      
      * Add build packaging
      
      * Update to latest
      
      * Pin to specific commit while pyproject.toml is fixed
      1b401823
  13. 01 6月, 2023 1 次提交
  14. 16 5月, 2023 1 次提交
    • M
      [CPU] Support Intel CPU inference (#3041) · 1f72082f
      Ma, Guokai 提交于
      * add fallback path for kernels used in megatron
      
      * temporary numactl WA for SPR 56core
      
      * adapt core allocation according to number of ranks
      
      * add switch to turn on numactl
      
      * detect number of cores on the system
      
      * allow select a subset of the cores on the system to bind
      
      * remove unneeded changes
      
      * add ccl backend
      
      * change nccl to ccl
      
      * remove unused code
      
      * add comm/ccl to ops
      
      * initial ccl comm support
      
      * first broadcast case passed
      
      * add CCL_Backend to DeepSpeed
      
      * support comm timer for CPU
      
      * support barrier for comm backend
      
      * support specify master address from deepspeed command line
      
      * support pytorch 2.0
      
      * remove 'block' from api
      
      * Tweak for debug
      Signed-off-by: NCao, Zhong Z <zhong.z.cao@intel.com>
      
      * Remove unecessary directory
      Signed-off-by: NCao, Zhong Z <zhong.z.cao@intel.com>
      
      * Add bf16 kernel support for inference
      
      * Add temporary torch implement for cpu inference
      
      * Add softmax ops cpu fallback for inference
      
      * bind cores to numa domain as well
      
      * merge latest change in gma/numactl
      
      * initial bf16 kernel support with fallback path
      
      * initial fallback path for bloom kernel injection
      
      * fix softmax attn mask
      
      * check KMP_AFFINITY to avoid conflict with numactl
      
      * New CCLBackend which utilize TorchBackend for initialization
      
      * rollback last change because there is result error
      
      * fix bloom injection policy TP could not work issue.
      
      injection_policy={BloomBlock: ("self_attention.dense", "mlp.dense_4h_to_h")}
      
      * Use TorchBackend to initialize CCLBackend, make behavior consistent
      
      * remove comm under deepspeed/ops
      
      * add license header
      
      * code clean up
      
      * fix format issue
      
      * remove magic number in main address
      
      * add caching support but not turn on by default
      
      * change name of inference_cuda_module to inference_module
      
      * Check for is_synchronized_device in accelerator before get Event
      
      * fix typo
      
      * Fix fallback path of softmax kernel on CUDA device for BF16 data type, because CUDA tril does not support BF16 datatype, enforce fp32 data type
      
      * add cpu backend files
      
      * change CPU_Accelerator op_builder_dir
      
      * remove cpu_kernel_path
      
      * using CPU_Accelerator on non-cuda device
      
      * fix deepspeed.op_builder => deepspeed.ops.op_builder
      
      * add alias for num_gpus: num_accelerators
      
      * allow loading cpu_builder in build stage
      
      * Assume cuda available if torch not installed
      
      * add oneccl_binding_pt to requirements
      
      * move oneccl-binding-pt to seperate requiremetns-cpu.txt
      
      * add missing file
      
      * use dependency_links in setuptools.setup() call for additional dependency links
      
      * install oneccl_bind_pt in workflows
      
      * change oneccl_bind_pt's version from 1.13 to 2.0
      
      * use intel_exention_for_pytorch as indicator that CPU_Accelerator should be used
      
      * Add indicator for Accelerator used
      
      * change foo.c to foo.cpp
      
      * exclude 'cpu' directory in CUDA op builder reflection
      
      * add a cpu-inference workflow
      
      * run cpu-inference workflow on self-hosted instance
      
      * change cpu runs-on node to v100 node
      
      * print out python version in workflow
      
      * add verbose in pip command to understand oneccl_bind_pt install issue
      
      * update cpu-inference workflow
      
      * add a stage to detect instance instruction sets
      
      * add back bf16 support for CPU inference
      
      * enable autoTP for bloom
      Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>
      
      * update workflow to detect cpu instruction sets
      
      * temporary WA for Intel Extension for PyTorch AVX2 instructioon set detection
      
      * change cpu-inference workflow machine to ubuntu-20.04
      
      * add sharded checkpoint loading for AutoTP path to reduce the peak memory in initialization stage
      Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>
      
      * enable policy for llama
      
      * use a special build ipex to test avx2 detection fix
      
      * fix format
      
      * fix test fail issue
      Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>
      
      * fix gptj sharded checkpoint loading problem
      Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>
      
      * return a not implemented build in get_op_builder in cpu_backend
      
      * support cpu device in tests
      
      * use cpuinfo to extract number of CPUs
      
      * use ~/tmp as transfomer cache rather than /blob/
      
      * Add support for mpich launcher with prefer_deepspeed_comm
      
      * add missing modification in accelerator
      
      * enable IMPI launcher
      
      * remove unused file and fix formatting
      
      * clean up ccl.cpp
      
      * Less confusing error message when certin op builder are not implemented
      
      * Fix license header
      
      * Add license header
      
      * add license headers
      
      * add license header
      
      * fix cuda specific code in test
      
      * update CPU workflow
      
      * use numactl to bind to core
      
      * allow bind_cores_to_rank in multi-node impi runner
      
      * fix format error
      
      * Remove InferenceBuilder
      
      * fix format error in numa.py
      
      * check whether op is in installed ops in ds_report.py
      
      * allow override accelerator with DS_ACCELERATOR='cuda','cpu' or 'xpu'
      
      * lazy init class_dict in CUDA_Accelerator to avoid cyclic initialization of CUDA_Accelerator
      
      * put short path in the beginning in real_accelerator.py
      
      * device_count return number of NUMA nodes
      
      * fix typo
      
      * install numactl in cpu workflow
      
      * Follow comments
      
      * Better implementation of device_count() and current_device()
      
      * remove dependency_link for Intel Extension for DeepSpeed
      
      * use check is_synchronized_device in timer only once
      
      * remove env mapping WA in cpu_accelerator
      
      * fix duplicate definition
      
      * fix format error
      
      * refine ccl backend selection
      
      * move comments to the right place
      
      * remove prefer_deepspeed_comm, use CCLBackend by default
      
      * refractor fallback path
      
      * Fix execution failure in kernel injection path
      
      * do not refractory kernel injection fallback path in  residual_add because it contains function call with side-effect
      
      * guard residual_add fallback path with environ DS_KI_FALLBACK=True
      
      * fix format error
      
      * add test for allreduce on CPU workflow
      
      * fix format error
      
      * Fallback to TorchBackend if CCLBackend kernel are not implemented
      
      * Update Intel Extension for Pytorch installation link
      
      * Don't specify version number of Intel Extension for PyTorch
      
      * install oneCCL for CCLBackend
      
      * fix link path for CPU comm kernels
      
      * fix source oneCCL environment
      
      * source oneCCL env before run UT
      
      * Give more specific instruction when CCL_ROOT not defined
      
      ---------
      Signed-off-by: NCao, Zhong Z <zhong.z.cao@intel.com>
      Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>
      Co-authored-by: Nsdp <sdp@aia-sdp-spr-108864.jf.intel.com>
      Co-authored-by: NCao, Zhong Z <zhong.z.cao@intel.com>
      Co-authored-by: NZhenhuan Chen <zhenhuan.chen@intel.com>
      Co-authored-by: Nbaodii <di.bao@intel.com>
      Co-authored-by: NWang, Yi A <yi.a.wang@intel.com>
      Co-authored-by: Njianan-gu <jianan.gu@intel.com>
      Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
      Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
      1f72082f
  15. 12 5月, 2023 1 次提交
  16. 04 5月, 2023 1 次提交
  17. 25 4月, 2023 1 次提交
  18. 19 4月, 2023 3 次提交
  19. 15 4月, 2023 1 次提交
  20. 14 4月, 2023 1 次提交
  21. 13 4月, 2023 1 次提交
    • L
      Update AMD workflows (#3179) · 9408a866
      Logan Adams 提交于
      * Update AMD workflows
      
      * Update MI200 test flow to use torch latest
      
      * Update tolerances to values that pass (will fix before completing PR)
      
      * Revert chyanges to atol
      
      * Rename workflows
      
      * Fix CI badges
      9408a866
  22. 06 4月, 2023 1 次提交
  23. 05 4月, 2023 1 次提交
  24. 24 3月, 2023 1 次提交
  25. 22 3月, 2023 1 次提交