提交 · 26aa3c84c689389b1adac088ac67168f3356cd21 · wux_labs / Tensorflow

14 9月, 2023 6 次提交

J
Excludes building pjrt_c_api_gpu_plugin.so (GPU only target) on MAC. · 26aa3c84
由 Jieying Luo 提交于 9月 13, 2023
```
PiperOrigin-RevId: 565092976
```
26aa3c84

[XLA] Rework dot() sharding propagation to lookahead instructions sharding to... · be0d1151

由 Marcello Maggioni 提交于 9月 13, 2023

[XLA] Rework dot() sharding propagation to lookahead instructions sharding to choose a sharding for dot() that agrees with the users if possible.

PiperOrigin-RevId: 565086052

be0d1151

O
[mhlo] Remove unused HloLegalizeToLhlo pass. · d11423a4
由 Oleg Shyshkov 提交于 9月 13, 2023
```
PiperOrigin-RevId: 565083641
```
d11423a4

[XLA] Add support to CollectivePipeliner to sink collectives. · f28e7353

由 Marcello Maggioni 提交于 9月 13, 2023

Small collectives might be better off when sinked and there are other potnential use cases
Also fix a bug, where we were accepting reuse of the data that we were storing and changing the tests using that pattern to match the fix.

PiperOrigin-RevId: 565080772

f28e7353

[xla] Change the collective-permute-decomposer to not chain the Send and Recv · 65bd6912

由 Bixia Zheng 提交于 9月 13, 2023

instructions through control dependence.

This is because the generated HLO program is correct even without the control
dependence chaining. The purpose of the control dependence chaining is to
support a scheduler, such as the latency hiding scheduler, and thus will be
added to the latency hiding scheduler preparation pass. Not producing the
control dependence chaining while decomposing collective-permute can also
simplify the implementation of collective-pipeliner in pipelining Send and
Recv instructions.

PiperOrigin-RevId: 565073772

65bd6912

Integrate LLVM at llvm/llvm-project@8ebe1d1cc1e4 · 74d7ec3b

由 Benjamin Kramer 提交于 9月 13, 2023

Updates LLVM usage to match
[8ebe1d1cc1e4](https://github.com/llvm/llvm-project/commit/8ebe1d1cc1e4)

PiperOrigin-RevId: 565069541

74d7ec3b

13 9月, 2023 34 次提交

Use /usertools/cpu.bazelrc for TSL presubmit build · c3cc395e

由 A. Unique TensorFlower 提交于 9月 13, 2023

Without this additional bazelrc file TSL will be built with the system compiler which happens to be GCC.

PiperOrigin-RevId: 565060297

c3cc395e

Change Triton not to depend on :ConversionPasses. · 44400f85

由 Peter Hawkins 提交于 9月 13, 2023

This change was merged upstream in https://github.com/openai/triton/pull/2068 but hasn't made it to the OpenXLA fork yet.

Improves build time in OSS by not building a number of MLIR dialects/passes that are not needed.

PiperOrigin-RevId: 565058458

44400f85

O
[XLA:GPU] Add more comments to gpu_performance_model. · 410d4576
由 Oleg Shyshkov 提交于 9月 13, 2023
```
PiperOrigin-RevId: 565025664
```
410d4576
G
[NFC] [XLA:GPU] Split up Triton emitter into multiple functions. · 4ae35f9f
由 George Karpenkov 提交于 9月 13, 2023
```
PiperOrigin-RevId: 565024582
```
4ae35f9f
T
[XLA:GPU][NFC] Cosmetic changes to gemm_rewriter_triton.cc · af5aa0e5
由 Tamás Danyluk 提交于 9月 13, 2023
```
Unify iterator names to *_it;
Use auto* instead of auto if the type is a pointer type.

PiperOrigin-RevId: 565016627
```
af5aa0e5

NFC: Extract launch dimensions computation from TritonWrapper. · 1803461c

由 Johannes Reifferscheid 提交于 9月 13, 2023

This is another prefactoring to make Triton fusions compatible with
FusionInterface and partially fused HLO. For the former, we need
the LaunchDimension computation to be a separate function. For the
latter, we change the launch dimension function signatures to no longer
take an HloComputation, because we don't yet have that during fusion (at
least not a complete one). For now, this change is a no-op, since we do
not yet have any boundary functions for non-fusion ops.

PiperOrigin-RevId: 565015567

1803461c

Update the documentation for the TF Lite delegate plug-in interface to clarify · 25ac48b7

由 Fergus Henderson 提交于 9月 13, 2023

that the corresponding FlatBuffer C API type can also be used instead of
the FlatBuffer C++ API type for the TFLiteSettings FlatBuffer parameter.

PiperOrigin-RevId: 565008668

25ac48b7

[XLA:GPU] Fix inconsistent check for reduction unroll factor · af731a2d

由 Son Tuan Vu 提交于 9月 13, 2023

From the original code at commit 03d304b, we are only supposed to scale the reduction
tile size of dimX by the unroll factor for column reductions, so the check when creating
`ReductionCodegenInfo` is only valid for column reductions.

PiperOrigin-RevId: 564991695

af731a2d

[XLA:GPU][NFC] Refactor Triton GEMM rewriter. · b710b1e0

由 Ilia Sergachev 提交于 9月 13, 2023

Add a comparison operator for fragments, improve encapsulation, fix comments.

PiperOrigin-RevId: 564990063

b710b1e0

Remove shared memory from LaunchDimensions. · 55bae7da

由 Johannes Reifferscheid 提交于 9月 13, 2023

LaunchDimensions are being decoupled from codegen. Shared memory requirements
are only known after codegen.

Currently, this feature is only used for Triton fusions. All other
fusions that use shared memory allocate it within the kernel.

PiperOrigin-RevId: 564981685

55bae7da

A
Update GraphDef version to 1618. · 0824c23f
由 A. Unique TensorFlower 提交于 9月 13, 2023
```
PiperOrigin-RevId: 564978233
```
0824c23f
A
compat: Update forward compatibility horizon to 2023-09-13 · e29f5612
由 A. Unique TensorFlower 提交于 9月 13, 2023
```
PiperOrigin-RevId: 564978209
```
e29f5612

Prepare for TensorRT 8.6 · 9377294a

由 A. Unique TensorFlower 提交于 9月 12, 2023

The upcoming CUDA-12 upgrades requires TensorRT 8.6 and this version
has a new set of headers which requires an update to the bazel configure script.

I also had to change `find_cuda_config.py` because previously it has only
been reporting the major version of TensorRT back to the configure script. But
we will also need the minor version to distinguish between TensorRT 8.5 and below,
and TensorRT 8.6+.

PiperOrigin-RevId: 564927510

9377294a

T
Merge pull request #61842 from redwrasse:redwrasse/test-name-syntax-misc · 93dee739
由 TensorFlower Gardener 提交于 9月 12, 2023
```
PiperOrigin-RevId: 564914384
```
93dee739
Z
Add a property for the number of embedding devices per chip. · 626e5f22
由 Ziyin Huang 提交于 9月 12, 2023
```
PiperOrigin-RevId: 564909053
```
626e5f22

Fix forward for GCC · 3d180202

由 David Majnemer 提交于 9月 12, 2023

GCC is missing some intrinsics, expand them by hand.

PiperOrigin-RevId: 564899448

3d180202

R
Use buffer ids to provide an explicitly stable sort. · 12fddfb6
由 Russell Power 提交于 9月 12, 2023
```
PiperOrigin-RevId: 564893903
```
12fddfb6

Instrument metrics to track the pjrt compilation status. · 9256cec6

由 A. Unique TensorFlower 提交于 9月 12, 2023

Create metrics:
1) '/pjrt/compiler/is_compiling_computation' to record if pjrt compiler is compiling computations.
2) '/pjrt/compiler/is_compiling_module' to record if pjrt compiler is compiling modules.

PiperOrigin-RevId: 564891869

9256cec6

J
Update version of rules_pkg · 278f8716
由 Jake Harmon 提交于 9月 12, 2023
```
PiperOrigin-RevId: 564886962
```
278f8716
A
Change visibility for tsl/lib/{histogram/monitoring}. · cffae6d8
由 A. Unique TensorFlower 提交于 9月 12, 2023
```
PiperOrigin-RevId: 564881319
```
cffae6d8
A
Add error logging instrumentation in phase one TF2XLA bridge. · 282193e8
由 A. Unique TensorFlower 提交于 9月 12, 2023
```
This is for internal error logging only.

PiperOrigin-RevId: 564878979
```
282193e8
Y
Describe reference counting behavior for EagerTensor/Handle conversion · e371aa90
由 Yu Feng 提交于 9月 12, 2023
```
PiperOrigin-RevId: 564871149
```
e371aa90
M
#tf-data Set up noop experiments. · d6edddeb
由 Matt Callanan 提交于 9月 12, 2023
```
PiperOrigin-RevId: 564864652
```
d6edddeb

[pjrt] Rewrite the transpose kernels, removing our dependence on Eigen · e61dffa5

由 David Majnemer 提交于 9月 12, 2023

For an 8x8 uint32_t transpose, we had:
- 4x `vinsertf128 ymm, ymm, xmm`
- 4x `vperm2f128`.

These are very expensive instructions because they cross the 128-bit lane boundary.

Now, we have `8x vinsertf128` but crucially, the inserted operand now comes from memory. This is important because modern X86 HW can easily broadcast on load which means that `vinsertf128` turns into a blend instead of a shuffle.

We use the same trick for handling matrices which are smaller than the vector width to accelerate the transpose. We still require a cross lane step but we cut down all the other shuffles in half compared to SSE2.

While we are here, don't claim to support kernels which don't exist. This makes the transpose system choose unoptimized implementations.

PiperOrigin-RevId: 564860657

e61dffa5

Remove check for input constant before running tf2xla op from MLIR · 3b4eda4f

由 A. Unique TensorFlower 提交于 9月 12, 2023

This check can be removed since tf2xla can run ops with non-const input even if CompileTimeConstant attribute is set with the help of valueinference.

PiperOrigin-RevId: 564851049

3b4eda4f

R
_python_memory_checker_helper.pyi update to adjust to upstream pybind11 changes. · c952951f
由 Ralf W. Grosse-Kunstleve 提交于 9月 12, 2023
```
PiperOrigin-RevId: 564845580
```
c952951f

Integrate LLVM at llvm/llvm-project@c1796be93fe5 · dd5cf20e

由 Benjamin Kramer 提交于 9月 12, 2023

Updates LLVM usage to match
[c1796be93fe5](https://github.com/llvm/llvm-project/commit/c1796be93fe5)

PiperOrigin-RevId: 564842806

dd5cf20e

A
Fix typo in error message · 0f406be2
由 A. Unique TensorFlower 提交于 9月 12, 2023
```
PiperOrigin-RevId: 564840225
```
0f406be2
A
Enable Split op for int4. · 66e19b9f
由 Antonio Sanchez 提交于 9月 12, 2023
```
PiperOrigin-RevId: 564825677
```
66e19b9f

Use /usertools/cpu.bazelrc for TSL presubmit build · d9ae77cb

由 Fiona Lang 提交于 9月 12, 2023

Without this additional bazelrc file TSL will be built with the system compiler which happens to be GCC.

I also had to disable a warning that was raised by clang. It puzzles me a bit that this is not needed for the Tensorflow build which definitely uses Clang.

PiperOrigin-RevId: 564822820

d9ae77cb

[XLA:GPU] Shard topk_kernel.cu.cc. · c332f6ba

由 Peter Hawkins 提交于 9月 12, 2023

I've seen this file take over 5 minutes to build. Shard it by type.

PiperOrigin-RevId: 564820851

c332f6ba

M
#tf-data Turn down `"file_locality_v2"` experiment. · 6758034f
由 Matt Callanan 提交于 9月 12, 2023
```
PiperOrigin-RevId: 564813085
```
6758034f

Fix NCCL UB issue · 4b8f8a33

由 A. Unique TensorFlower 提交于 9月 12, 2023

This is fixing a UB issue which occurs with newer version of Clang (17+).
The fix is also upstreamed through https://github.com/NVIDIA/nccl/pull/916.

In addition I'm changing the handling of `enqueue.cc` which needs to be compiled
in cuda mode under clang. The previous solution with just passing in the `-x cuda` option fails with CUDA 12+.

I'm also correcting the version number that we set in the patch - not sure if this version is reported in some logs, but if it is, it should be correct.

PiperOrigin-RevId: 564811002

4b8f8a33

Add CopyToMemorySpace to the PjRtBuffer API. This CL does not implement any... · c2f38110

由 A. Unique TensorFlower 提交于 9月 12, 2023

Add CopyToMemorySpace to the PjRtBuffer API. This CL does not implement any instance of the method, but adds the ability to do so in followup CLs.

PiperOrigin-RevId: 564807735

c2f38110