提交 · 1efd820db36533f65f79f0c4475b9a5adc5c21bd · 曾经的那一瞬间 / Tensorflow

20 7月, 2023 40 次提交

Add a basic test for ragged layout. · 1efd820d

由 Tianrun Li 提交于 7月 19, 2023

DTensor api.relayout and collectives need to recognize the new layout type.

PiperOrigin-RevId: 549477658

1efd820d

Add unused parameter enable_stub_generation to open-source version of... · b942a91f

由 A. Unique TensorFlower 提交于 7月 19, 2023

Add unused parameter enable_stub_generation to open-source version of pybind_extension to be in sync with the internal version.

PiperOrigin-RevId: 549476112

b942a91f

H
Add an API to iterate through Status payloads · 56855f3b
由 Haibo Huang 提交于 7月 19, 2023
```
This is useful when we need to pass status through API boundary.

PiperOrigin-RevId: 549475209
```
56855f3b

Use MultipleIterationsAutoScaler in DataServiceDispatcherImpl · b168b7e9

由 Armando Ugalde Velasco 提交于 7月 19, 2023

Use MultipleIterationsAutoScaler inside the data service dispatcher implementation as follows:
- UpdateOptimalNumberOfWorkersMetric() in the maintenance thread.
- ReportProcessingTime() when receiving processing times from a WorkerHeartbeat.
- ReportTargetProcessingTime() when receiving a target processing time from a ClientHeartbeat.
- RemoveWorker() when detecting missing workers or executing MaybeRemoveTask.
- RemoveConsumer() when releasing missing clients.
- RegisterIteration() when creating a new Iteration.
- UnregisterIteration() when garbage-collecting old Iterations.

PiperOrigin-RevId: 549469531

b168b7e9

S
Remove the unused variable, which was causing TAP build to fail. · b3510df4
由 Scott Zhu 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549468160
```
b3510df4
A
[xla:gpu] Add a heuristic for avoiding large kernels in concurrent regions · ec11351c
由 Anlun Xu 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549464529
```
ec11351c
Y
Allow operand and update that have dynamic dimension when legalizing mhlo.dynamic_update_slice · 2a7cd1ff
由 Yishuang Pang 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549462485
```
2a7cd1ff

Refactor tests for general conv op. · d28c92d9

由 Nicolas Perez 提交于 7月 19, 2023

Taking relevant tests from conv_ops_test and conv_ops_3d_test to test general conv Op. Refactor test cases to be parameterized instead of running for loops.

PiperOrigin-RevId: 549458691

d28c92d9

J
[TF:PJRT] Respect ExecuteOption.non_donatable_input_indices in GPU PJRT client. · 278a634d
由 Jieying Luo 提交于 7月 19, 2023
```
Note TF runtime side is already set up in xla_launch_util.

PiperOrigin-RevId: 549456738
```
278a634d

[XLA/GPU] Add support for pipelining of reduce-scatter instructions · 1cd941ac

由 Rahul Joshi 提交于 7月 19, 2023

- Add option `xla_gpu_enable_pipelined_reduce_scatter` to enable forward
  pipelining of reduce-scatter instructions

PiperOrigin-RevId: 549452899

1cd941ac

Y
Legalize mhlo.dynamic_reshape to tf.reshape · faf05003
由 Yishuang Pang 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549451568
```
faf05003
S
[XLA:Python] Replace dlpack call to PjRtBuffer::on_device_shape() with more specific call. · 6fdbca86
由 Skye Wanderman-Milne 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549440289
```
6fdbca86

Move pseudo-constant from while loop argument to while loop body in... · 771f9625

由 A. Unique TensorFlower 提交于 7月 19, 2023

Move pseudo-constant from while loop argument to while loop body in `tfl_while_outline` to avoid memory penalty during runtime

PiperOrigin-RevId: 549440221

771f9625

[PJRT] Rewrite PjRtBuffer::ToLiteralSync to use new shape getters when possible. · 85d3662a

由 Skye Wanderman-Milne 提交于 7月 19, 2023

It's not possible for tuple buffers (yet?).

The eventual goal is for ML frameworks to only call individual getters
instead of using PjRtBuffer::{logical_}on_device_shape, since passing
around xla::Shapes is expensive and often includes more information
than is necessary or even meaningful. We'd like to eventually remove
PJRT_Buffer_OnDeviceTrimmedShape from the PJRT C API altogether
({logical_}on_device_shape will likely stay for non-ML framework
usage).

PiperOrigin-RevId: 549438520

85d3662a

Y
Add Python binding for the new ragged Layout type. · ea6b1678
由 Yu Feng 提交于 7月 19, 2023
```
Also tested that relayout to ragged layout works out of box!

PiperOrigin-RevId: 549438291
```
ea6b1678

Update TFRT dependency to use revision · 55d67a7e

由 A. Unique TensorFlower 提交于 7月 19, 2023

http://github.com/tensorflow/runtime/commit/2a7a9bde82ee99f382b26c75769ece54464a210d.

PiperOrigin-RevId: 549436957

55d67a7e

Update the summary.write for dtensor related use case. · 3b9573c1

由 Scott Zhu 提交于 7月 19, 2023

The tf.device(CPU:0) is not respected by the dtensor, so we need to explicitly convert the tensor value to dtensor, and make sure they are put on the proper mesh (CPU host mesh) for logging.

Also update the test to mimic the current production behavior.

PiperOrigin-RevId: 549432976

3b9573c1

[AutoSharding] Make sure that the 1D device mesh in cluster environment... · 300a21bf

由 A. Unique TensorFlower 提交于 7月 19, 2023

[AutoSharding] Make sure that the 1D device mesh in cluster environment matches the assumptions made by `ReshardingCostMixedMeshShape` in auto_sharding_utils.

PiperOrigin-RevId: 549428578

300a21bf

R
[NFC] Cleanup some unused headers and deps · 038d2555
由 Rahul Joshi 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549422754
```
038d2555

[XLA] Include HLO opcode to pipeline in CollectivePipeliner config. · 567c17b0

由 Rahul Joshi 提交于 7月 19, 2023

- Specify the HLO opcode the pipeliner config, and use that to derive
  more descriptive pass name
- Some minor code cleanup.

PiperOrigin-RevId: 549422627

567c17b0

A
Remove jitrt usages from tfrt. · 90cc13ef
由 A. Unique TensorFlower 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549422464
```
90cc13ef
F
Move training imports from python/__init__.py to python/modules_with_exports.py. · e96b7fb2
由 Fiona Lang 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549421317
```
e96b7fb2
Y
legalize mhlo.dynamic_iota 1D to tf.reshape and tf.range · f441f1fc
由 Yishuang Pang 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549420699
```
f441f1fc

Remove irrelevant configs from TF's bazelrc · ffd06c07

由 Kanglan Tang 提交于 7月 19, 2023

The following configs are removed:
- v1
- avx2_win
- avx2_linux
- native_arch_linux
- numa
- libc++
- ios_i386
- stackdriver_support
- rbe_lite_linux
- rbe_linux_cuda_nvcc
- rbe_gpu_linux
- rbe_linux_cuda11.2_nvcc_py3.8, rbe_linux_cuda_nvcc_py38
- rbe_linux_cuda11.2_nvcc_py3.10, rbe_linux_cuda_nvcc_py310
- rbe_linux_rocm_py3.7, rbe_linux_rocm_py3.8, rbe_linux_rocm_py3.10
- rbe_linux_cuda_clang_base, rbe_linux_cuda_clang_py**
- rbe_win_py37, rbe_win_py310

If the removal of a config breaks your workflow, you can add it back as a command line option. If you think the config is removed mistakenly, please open an issue on GitHub.

PiperOrigin-RevId: 549419745

ffd06c07

J
Use inferred vs. actual shapes for multi-device output layouts. · 9c805809
由 Justin Szaday 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549417829
```
9c805809

[AutoSharding] Ensure that strategies are generated for custom call ops with... · 52f0eeac

由 A. Unique TensorFlower 提交于 7月 19, 2023

[AutoSharding] Ensure that strategies are generated for custom call ops with user shardings. Previously, no shardings strategies were being generated for such ops.

PiperOrigin-RevId: 549414995

52f0eeac

Integrate LLVM at llvm/llvm-project@3cd3f11c174b · 7d00bce4

由 A. Unique TensorFlower 提交于 7月 19, 2023

Updates LLVM usage to match
[3cd3f11c174b](https://github.com/llvm/llvm-project/commit/3cd3f11c174b)

PiperOrigin-RevId: 549413700

7d00bce4

A
Implement the MultipleIterationsAutoScaler::UpdateOptimalNumberOfWorkersMetric() method · 6ff8b826
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549411619
```
6ff8b826
Y
Error if memory_kind is not correct for the devices in Shardings during initialization. · 91fbd917
由 Yash Katariya 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549410478
```
91fbd917
A
Implement the MultipleIterationsAutoScaler::GetOptimalNumberOfWorkers() method · 885ba23f
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549410198
```
885ba23f
A
Implement the MultipleIterationsAutoScaler::RemoveConsumer() method · 20deb961
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549408597
```
20deb961
R
[XLA] Cleanup unused code and deps in channel_tracker · b0ee2a7d
由 Rahul Joshi 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549406453
```
b0ee2a7d
A
Skip UniformQuantize ops in LegalizeTfTypesPass since they are covered in LegalizeTF pass · a0fe7f99
由 A. Unique TensorFlower 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549404348
```
a0fe7f99
A
Implement the MultipleIterationsAutoScaler::RemoveWorker() method · 86c9d3c0
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549401555
```
86c9d3c0
A
Implement the MultipleIterationsAutoScaler::ReportTargetProcessingTime() method · 9944bd68
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549400109
```
9944bd68
R
Unique stacks and frames when attaching debug information to graphs. · 0032115c
由 Russell Power 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549397949
```
0032115c
A
Implement the MultipleIterationsAutoScaler::ReportProcessingTime() method · 5ac2db71
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549397765
```
5ac2db71
A
Implement the MultipleIterationsAutoScaler::UnregisterIteration() method · eb724c51
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549396252
```
eb724c51

PR #4019: [NVIDIA XLA:GPU] Introducing training support for cudnn fused mha... · d30587cf

由 TJ Xu 提交于 7月 19, 2023

PR #4019: [NVIDIA XLA:GPU] Introducing training support for cudnn fused mha (LHLO lowering and thunk, 2nd out of 3)

Imported from GitHub PR https://github.com/openxla/xla/pull/4019

This is a follow-up pr for [this](https://github.com/openxla/xla/pull/3886).
In total, fused MHA training changes are broken into 3 PRs.
This is the 2nd of 3. The other PRs' links:
1st - Stream executor changes are in this https://github.com/openxla/xla/pull/3886 (merged)
3rd - Rewriter changes are in this https://github.com/openxla/xla/pull/3726
This pr contains changes to lower lhlo fused mha custom call to cudnnFMHA thunk and runner for both forward and backward fmha calls. It also adds activation output to forward calls in thunk, runner and stream executor to support training cases.
All the functional and unit testing will be in the 3rd and final PR.
Copybara import of the project:

--
3a5165f03b8a64b05486eaadc1c677b6da9346b9 by TJ <tjx@nvidia.com>:

Introducing lhlo lowering logic for fused mha.
This pr contains changes to lower lhlo fused mha custom call to
cudnnFMHA thunk and runner for both forward and backward fmha calls.

--
ca9b0d9ec59e02eb4d8718575b78fc17373f4fcc by TJ <tjx@nvidia.com>:

Address pr comments

--
af688151a3a628cfa115ce4ff50df0f549181054 by TJ <tjx@nvidia.com>:

Make uid initialization consistent between backward and forward calls

Merging this change closes #4019

PiperOrigin-RevId: 549395389

d30587cf

A
Implement the MultipleIterationsAutoScaler::RegisterIteration() method · 648871d3
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549394586
```
648871d3

曾经的那一瞬间 / Tensorflow 10 个月 前同步成功

曾经的那一瞬间 / Tensorflow
10 个月前同步成功