- 20 7月, 2023 40 次提交
-
-
由 Tianrun Li 提交于
DTensor api.relayout and collectives need to recognize the new layout type. PiperOrigin-RevId: 549477658
-
由 A. Unique TensorFlower 提交于
Add unused parameter enable_stub_generation to open-source version of pybind_extension to be in sync with the internal version. PiperOrigin-RevId: 549476112
-
由 Haibo Huang 提交于
This is useful when we need to pass status through API boundary. PiperOrigin-RevId: 549475209
-
由 Armando Ugalde Velasco 提交于
Use MultipleIterationsAutoScaler inside the data service dispatcher implementation as follows: - UpdateOptimalNumberOfWorkersMetric() in the maintenance thread. - ReportProcessingTime() when receiving processing times from a WorkerHeartbeat. - ReportTargetProcessingTime() when receiving a target processing time from a ClientHeartbeat. - RemoveWorker() when detecting missing workers or executing MaybeRemoveTask. - RemoveConsumer() when releasing missing clients. - RegisterIteration() when creating a new Iteration. - UnregisterIteration() when garbage-collecting old Iterations. PiperOrigin-RevId: 549469531
-
由 Scott Zhu 提交于
PiperOrigin-RevId: 549468160
-
由 Anlun Xu 提交于
PiperOrigin-RevId: 549464529
-
由 Yishuang Pang 提交于
PiperOrigin-RevId: 549462485
-
由 Nicolas Perez 提交于
Taking relevant tests from conv_ops_test and conv_ops_3d_test to test general conv Op. Refactor test cases to be parameterized instead of running for loops. PiperOrigin-RevId: 549458691
-
由 Jieying Luo 提交于
Note TF runtime side is already set up in xla_launch_util. PiperOrigin-RevId: 549456738
-
由 Rahul Joshi 提交于
- Add option `xla_gpu_enable_pipelined_reduce_scatter` to enable forward pipelining of reduce-scatter instructions PiperOrigin-RevId: 549452899
-
由 Yishuang Pang 提交于
PiperOrigin-RevId: 549451568
-
由 Skye Wanderman-Milne 提交于
PiperOrigin-RevId: 549440289
-
由 A. Unique TensorFlower 提交于
Move pseudo-constant from while loop argument to while loop body in `tfl_while_outline` to avoid memory penalty during runtime PiperOrigin-RevId: 549440221
-
由 Skye Wanderman-Milne 提交于
It's not possible for tuple buffers (yet?). The eventual goal is for ML frameworks to only call individual getters instead of using PjRtBuffer::{logical_}on_device_shape, since passing around xla::Shapes is expensive and often includes more information than is necessary or even meaningful. We'd like to eventually remove PJRT_Buffer_OnDeviceTrimmedShape from the PJRT C API altogether ({logical_}on_device_shape will likely stay for non-ML framework usage). PiperOrigin-RevId: 549438520
-
由 Yu Feng 提交于
Also tested that relayout to ragged layout works out of box! PiperOrigin-RevId: 549438291
-
由 A. Unique TensorFlower 提交于
http://github.com/tensorflow/runtime/commit/2a7a9bde82ee99f382b26c75769ece54464a210d. PiperOrigin-RevId: 549436957
-
由 Scott Zhu 提交于
The tf.device(CPU:0) is not respected by the dtensor, so we need to explicitly convert the tensor value to dtensor, and make sure they are put on the proper mesh (CPU host mesh) for logging. Also update the test to mimic the current production behavior. PiperOrigin-RevId: 549432976
-
由 A. Unique TensorFlower 提交于
[AutoSharding] Make sure that the 1D device mesh in cluster environment matches the assumptions made by `ReshardingCostMixedMeshShape` in auto_sharding_utils. PiperOrigin-RevId: 549428578
-
由 Rahul Joshi 提交于
PiperOrigin-RevId: 549422754
-
由 Rahul Joshi 提交于
- Specify the HLO opcode the pipeliner config, and use that to derive more descriptive pass name - Some minor code cleanup. PiperOrigin-RevId: 549422627
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 549422464
-
由 Fiona Lang 提交于
PiperOrigin-RevId: 549421317
-
由 Yishuang Pang 提交于
PiperOrigin-RevId: 549420699
-
由 Kanglan Tang 提交于
The following configs are removed: - v1 - avx2_win - avx2_linux - native_arch_linux - numa - libc++ - ios_i386 - stackdriver_support - rbe_lite_linux - rbe_linux_cuda_nvcc - rbe_gpu_linux - rbe_linux_cuda11.2_nvcc_py3.8, rbe_linux_cuda_nvcc_py38 - rbe_linux_cuda11.2_nvcc_py3.10, rbe_linux_cuda_nvcc_py310 - rbe_linux_rocm_py3.7, rbe_linux_rocm_py3.8, rbe_linux_rocm_py3.10 - rbe_linux_cuda_clang_base, rbe_linux_cuda_clang_py** - rbe_win_py37, rbe_win_py310 If the removal of a config breaks your workflow, you can add it back as a command line option. If you think the config is removed mistakenly, please open an issue on GitHub. PiperOrigin-RevId: 549419745
-
由 Justin Szaday 提交于
PiperOrigin-RevId: 549417829
-
由 A. Unique TensorFlower 提交于
[AutoSharding] Ensure that strategies are generated for custom call ops with user shardings. Previously, no shardings strategies were being generated for such ops. PiperOrigin-RevId: 549414995
-
由 A. Unique TensorFlower 提交于
Updates LLVM usage to match [3cd3f11c174b](https://github.com/llvm/llvm-project/commit/3cd3f11c174b) PiperOrigin-RevId: 549413700
-
由 Armando Ugalde Velasco 提交于
PiperOrigin-RevId: 549411619
-
由 Yash Katariya 提交于
PiperOrigin-RevId: 549410478
-
由 Armando Ugalde Velasco 提交于
PiperOrigin-RevId: 549410198
-
由 Armando Ugalde Velasco 提交于
PiperOrigin-RevId: 549408597
-
由 Rahul Joshi 提交于
PiperOrigin-RevId: 549406453
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 549404348
-
由 Armando Ugalde Velasco 提交于
PiperOrigin-RevId: 549401555
-
由 Armando Ugalde Velasco 提交于
PiperOrigin-RevId: 549400109
-
由 Russell Power 提交于
PiperOrigin-RevId: 549397949
-
由 Armando Ugalde Velasco 提交于
PiperOrigin-RevId: 549397765
-
由 Armando Ugalde Velasco 提交于
PiperOrigin-RevId: 549396252
-
由 TJ Xu 提交于
PR #4019: [NVIDIA XLA:GPU] Introducing training support for cudnn fused mha (LHLO lowering and thunk, 2nd out of 3) Imported from GitHub PR https://github.com/openxla/xla/pull/4019 This is a follow-up pr for [this](https://github.com/openxla/xla/pull/3886). In total, fused MHA training changes are broken into 3 PRs. This is the 2nd of 3. The other PRs' links: 1st - Stream executor changes are in this https://github.com/openxla/xla/pull/3886 (merged) 3rd - Rewriter changes are in this https://github.com/openxla/xla/pull/3726 This pr contains changes to lower lhlo fused mha custom call to cudnnFMHA thunk and runner for both forward and backward fmha calls. It also adds activation output to forward calls in thunk, runner and stream executor to support training cases. All the functional and unit testing will be in the 3rd and final PR. Copybara import of the project: -- 3a5165f03b8a64b05486eaadc1c677b6da9346b9 by TJ <tjx@nvidia.com>: Introducing lhlo lowering logic for fused mha. This pr contains changes to lower lhlo fused mha custom call to cudnnFMHA thunk and runner for both forward and backward fmha calls. -- ca9b0d9ec59e02eb4d8718575b78fc17373f4fcc by TJ <tjx@nvidia.com>: Address pr comments -- af688151a3a628cfa115ce4ff50df0f549181054 by TJ <tjx@nvidia.com>: Make uid initialization consistent between backward and forward calls Merging this change closes #4019 PiperOrigin-RevId: 549395389
-
由 Armando Ugalde Velasco 提交于
PiperOrigin-RevId: 549394586
-