提交 · d28c92d915add8c7336681d9fc8c57577a3f842a · 曾经的那一瞬间 / Tensorflow

20 7月, 2023 40 次提交

Refactor tests for general conv op. · d28c92d9

由 Nicolas Perez 提交于 7月 19, 2023

Taking relevant tests from conv_ops_test and conv_ops_3d_test to test general conv Op. Refactor test cases to be parameterized instead of running for loops.

PiperOrigin-RevId: 549458691

d28c92d9

J
[TF:PJRT] Respect ExecuteOption.non_donatable_input_indices in GPU PJRT client. · 278a634d
由 Jieying Luo 提交于 7月 19, 2023
```
Note TF runtime side is already set up in xla_launch_util.

PiperOrigin-RevId: 549456738
```
278a634d

[XLA/GPU] Add support for pipelining of reduce-scatter instructions · 1cd941ac

由 Rahul Joshi 提交于 7月 19, 2023

- Add option `xla_gpu_enable_pipelined_reduce_scatter` to enable forward
  pipelining of reduce-scatter instructions

PiperOrigin-RevId: 549452899

1cd941ac

Y
Legalize mhlo.dynamic_reshape to tf.reshape · faf05003
由 Yishuang Pang 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549451568
```
faf05003
S
[XLA:Python] Replace dlpack call to PjRtBuffer::on_device_shape() with more specific call. · 6fdbca86
由 Skye Wanderman-Milne 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549440289
```
6fdbca86

Move pseudo-constant from while loop argument to while loop body in... · 771f9625

由 A. Unique TensorFlower 提交于 7月 19, 2023

Move pseudo-constant from while loop argument to while loop body in `tfl_while_outline` to avoid memory penalty during runtime

PiperOrigin-RevId: 549440221

771f9625

[PJRT] Rewrite PjRtBuffer::ToLiteralSync to use new shape getters when possible. · 85d3662a

由 Skye Wanderman-Milne 提交于 7月 19, 2023

It's not possible for tuple buffers (yet?).

The eventual goal is for ML frameworks to only call individual getters
instead of using PjRtBuffer::{logical_}on_device_shape, since passing
around xla::Shapes is expensive and often includes more information
than is necessary or even meaningful. We'd like to eventually remove
PJRT_Buffer_OnDeviceTrimmedShape from the PJRT C API altogether
({logical_}on_device_shape will likely stay for non-ML framework
usage).

PiperOrigin-RevId: 549438520

85d3662a

Y
Add Python binding for the new ragged Layout type. · ea6b1678
由 Yu Feng 提交于 7月 19, 2023
```
Also tested that relayout to ragged layout works out of box!

PiperOrigin-RevId: 549438291
```
ea6b1678

Update TFRT dependency to use revision · 55d67a7e

由 A. Unique TensorFlower 提交于 7月 19, 2023

http://github.com/tensorflow/runtime/commit/2a7a9bde82ee99f382b26c75769ece54464a210d.

PiperOrigin-RevId: 549436957

55d67a7e

Update the summary.write for dtensor related use case. · 3b9573c1

由 Scott Zhu 提交于 7月 19, 2023

The tf.device(CPU:0) is not respected by the dtensor, so we need to explicitly convert the tensor value to dtensor, and make sure they are put on the proper mesh (CPU host mesh) for logging.

Also update the test to mimic the current production behavior.

PiperOrigin-RevId: 549432976

3b9573c1

[AutoSharding] Make sure that the 1D device mesh in cluster environment... · 300a21bf

由 A. Unique TensorFlower 提交于 7月 19, 2023

[AutoSharding] Make sure that the 1D device mesh in cluster environment matches the assumptions made by `ReshardingCostMixedMeshShape` in auto_sharding_utils.

PiperOrigin-RevId: 549428578

300a21bf

R
[NFC] Cleanup some unused headers and deps · 038d2555
由 Rahul Joshi 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549422754
```
038d2555

[XLA] Include HLO opcode to pipeline in CollectivePipeliner config. · 567c17b0

由 Rahul Joshi 提交于 7月 19, 2023

- Specify the HLO opcode the pipeliner config, and use that to derive
  more descriptive pass name
- Some minor code cleanup.

PiperOrigin-RevId: 549422627

567c17b0

A
Remove jitrt usages from tfrt. · 90cc13ef
由 A. Unique TensorFlower 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549422464
```
90cc13ef
F
Move training imports from python/__init__.py to python/modules_with_exports.py. · e96b7fb2
由 Fiona Lang 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549421317
```
e96b7fb2
Y
legalize mhlo.dynamic_iota 1D to tf.reshape and tf.range · f441f1fc
由 Yishuang Pang 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549420699
```
f441f1fc

Remove irrelevant configs from TF's bazelrc · ffd06c07

由 Kanglan Tang 提交于 7月 19, 2023

The following configs are removed:
- v1
- avx2_win
- avx2_linux
- native_arch_linux
- numa
- libc++
- ios_i386
- stackdriver_support
- rbe_lite_linux
- rbe_linux_cuda_nvcc
- rbe_gpu_linux
- rbe_linux_cuda11.2_nvcc_py3.8, rbe_linux_cuda_nvcc_py38
- rbe_linux_cuda11.2_nvcc_py3.10, rbe_linux_cuda_nvcc_py310
- rbe_linux_rocm_py3.7, rbe_linux_rocm_py3.8, rbe_linux_rocm_py3.10
- rbe_linux_cuda_clang_base, rbe_linux_cuda_clang_py**
- rbe_win_py37, rbe_win_py310

If the removal of a config breaks your workflow, you can add it back as a command line option. If you think the config is removed mistakenly, please open an issue on GitHub.

PiperOrigin-RevId: 549419745

ffd06c07

J
Use inferred vs. actual shapes for multi-device output layouts. · 9c805809
由 Justin Szaday 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549417829
```
9c805809

[AutoSharding] Ensure that strategies are generated for custom call ops with... · 52f0eeac

由 A. Unique TensorFlower 提交于 7月 19, 2023

[AutoSharding] Ensure that strategies are generated for custom call ops with user shardings. Previously, no shardings strategies were being generated for such ops.

PiperOrigin-RevId: 549414995

52f0eeac

Integrate LLVM at llvm/llvm-project@3cd3f11c174b · 7d00bce4

由 A. Unique TensorFlower 提交于 7月 19, 2023

Updates LLVM usage to match
[3cd3f11c174b](https://github.com/llvm/llvm-project/commit/3cd3f11c174b)

PiperOrigin-RevId: 549413700

7d00bce4

A
Implement the MultipleIterationsAutoScaler::UpdateOptimalNumberOfWorkersMetric() method · 6ff8b826
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549411619
```
6ff8b826
Y
Error if memory_kind is not correct for the devices in Shardings during initialization. · 91fbd917
由 Yash Katariya 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549410478
```
91fbd917
A
Implement the MultipleIterationsAutoScaler::GetOptimalNumberOfWorkers() method · 885ba23f
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549410198
```
885ba23f
A
Implement the MultipleIterationsAutoScaler::RemoveConsumer() method · 20deb961
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549408597
```
20deb961
R
[XLA] Cleanup unused code and deps in channel_tracker · b0ee2a7d
由 Rahul Joshi 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549406453
```
b0ee2a7d
A
Skip UniformQuantize ops in LegalizeTfTypesPass since they are covered in LegalizeTF pass · a0fe7f99
由 A. Unique TensorFlower 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549404348
```
a0fe7f99
A
Implement the MultipleIterationsAutoScaler::RemoveWorker() method · 86c9d3c0
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549401555
```
86c9d3c0
A
Implement the MultipleIterationsAutoScaler::ReportTargetProcessingTime() method · 9944bd68
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549400109
```
9944bd68
R
Unique stacks and frames when attaching debug information to graphs. · 0032115c
由 Russell Power 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549397949
```
0032115c
A
Implement the MultipleIterationsAutoScaler::ReportProcessingTime() method · 5ac2db71
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549397765
```
5ac2db71
A
Implement the MultipleIterationsAutoScaler::UnregisterIteration() method · eb724c51
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549396252
```
eb724c51

PR #4019: [NVIDIA XLA:GPU] Introducing training support for cudnn fused mha... · d30587cf

由 TJ Xu 提交于 7月 19, 2023

PR #4019: [NVIDIA XLA:GPU] Introducing training support for cudnn fused mha (LHLO lowering and thunk, 2nd out of 3)

Imported from GitHub PR https://github.com/openxla/xla/pull/4019

This is a follow-up pr for [this](https://github.com/openxla/xla/pull/3886).
In total, fused MHA training changes are broken into 3 PRs.
This is the 2nd of 3. The other PRs' links:
1st - Stream executor changes are in this https://github.com/openxla/xla/pull/3886 (merged)
3rd - Rewriter changes are in this https://github.com/openxla/xla/pull/3726
This pr contains changes to lower lhlo fused mha custom call to cudnnFMHA thunk and runner for both forward and backward fmha calls. It also adds activation output to forward calls in thunk, runner and stream executor to support training cases.
All the functional and unit testing will be in the 3rd and final PR.
Copybara import of the project:

--
3a5165f03b8a64b05486eaadc1c677b6da9346b9 by TJ <tjx@nvidia.com>:

Introducing lhlo lowering logic for fused mha.
This pr contains changes to lower lhlo fused mha custom call to
cudnnFMHA thunk and runner for both forward and backward fmha calls.

--
ca9b0d9ec59e02eb4d8718575b78fc17373f4fcc by TJ <tjx@nvidia.com>:

Address pr comments

--
af688151a3a628cfa115ce4ff50df0f549181054 by TJ <tjx@nvidia.com>:

Make uid initialization consistent between backward and forward calls

Merging this change closes #4019

PiperOrigin-RevId: 549395389

d30587cf

A
Implement the MultipleIterationsAutoScaler::RegisterIteration() method · 648871d3
由 Armando Ugalde Velasco 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549394586
```
648871d3

Add MultipleIterationsAutoScaler declaration · 11fb80be

由 Armando Ugalde Velasco 提交于 7月 19, 2023

Declare the MultipleIterationsAutoScaler class, which is needed to estimate the optimal number of workers for all Iterations running in a tf.data service cluster. Also delete the UpdateOptimalNumberOfWorkers metric method from AutoScaler, since it will be moved to MultipleIterationsAutoScaler.

PiperOrigin-RevId: 549392975

11fb80be

T
Merge pull request #61244 from Intel-tensorflow:zhoulong/enable_copy_tensor_for_nextpluggabledevice · fb110022
由 TensorFlower Gardener 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549391535
```
fb110022
M
#tf-data Ramp up `"file_locality"` experiment to 10%. · 25cf519f
由 Matt Callanan 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549387461
```
25cf519f

[PJRT:C] Add APIs for determining if there are dynamic dimensions. · d87642b1

由 Skye Wanderman-Milne 提交于 7月 19, 2023

This change adds the following interfaces:
* PJRT_Buffer_DynamicDimensionIndices
* PjRtBuffer::dynamic_dimensions
* PjRtBuffer::has_dynamic_dimensions

PjRtBuffer::has_dynamic_dimensions is the only function actually
needed by callers. However, I implemented it by adding
PJRT_Buffer_DynamicDimensionIndices to the C API, rather than a C
function just returning whether there are dynamic dimensions, since
returning which dimensions are dynamic gives the same information and
is more future-proof in case we do need the extra info.

To implement the C function, I added PjRtBuffer::dynamic_dimensions,
which is only implemented for non-C PJRT clients for now, although it
could be implmented in PjRtCApiClient if needed (it's a bit annoying
because the interface is that of xla::Shape instead of
PJRT_Buffer_DynamicDimensionIndices. I made them different because the
PJRT_Buffer_DynamicDimensionIndices interface is better for
implementing has_dynamic_dimensions).

The eventual goal is for ML frameworks to call individual getters
instead of using PjRtBuffer::{logical_}on_device_shape, since passing
around xla::Shapes is expensive and often includes more information
than is necessary or even meaningful. We'd like to eventually remove
PJRT_Buffer_OnDeviceTrimmedShape from the PJRT C API altogether
({logical_}on_device_shape will likely stay for non-ML framework
usage).

PiperOrigin-RevId: 549381647

d87642b1

P
[NumPy] Fix test failures under NumPy 1.25. · eb5ea898
由 Peter Hawkins 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549381271
```
eb5ea898

Add GPU ID to FFT plan cache. · 6cf2b878

由 Antonio Sanchez 提交于 7月 19, 2023

Previously, the CUDA FFT plans were cached based solely on FFT parameters.
However, this breaks in multi-GPU setups, since `cufftHandle`s are device-specific.
This led to garbage outputs when an FFT plan was attempted to be used on a
different device.

Added the GPU device ID to the plan cache.

Fixes #60926.

PiperOrigin-RevId: 549375830

6cf2b878

E
[xla:gpu] Add time based and OOM cuda graph eviction policy · 279f26c7
由 Eugene Zhulenev 提交于 7月 19, 2023
```
PiperOrigin-RevId: 549373738
```
279f26c7

曾经的那一瞬间 / Tensorflow 10 个月 前同步成功

曾经的那一瞬间 / Tensorflow
10 个月前同步成功