- 14 9月, 2023 34 次提交
-
-
由 Jake Harmon 提交于
PiperOrigin-RevId: 565223624
-
由 Ryan M. Lefever 提交于
Fix a bug in which we over-allocate space for slices, when they are colocated with larger buffers. The interaction causing this behavior is as follows: A) GlobalDecreasingSizeBestFitHeap::FindChunkCandidates() adds additional space to the last chunk in a sliced allocation, to account for max_colocation_size. B) When AlternateMemoryBestFitHeap::CheckPrefetchFit() computes slices_for_pending_chunks, it recomputes the size of the sliced allocation as the sum of the sizes of the chunks returned from A. Note, we do not recompute the size for the allocation in a non-sliced world. C) Before committing a chunk, GlobalDecreasingSizeBestFitHeap::CommitChunk() changes the chunk's size to fit the size from B. Thus, in the sliced case we keep the extra max_colocation_size space, since we recalculated the allocation size with it. In the non-sliced case, we adjust the chunk size back to what is needed for the request. So, this change is a no-op for non-slices. PiperOrigin-RevId: 565217603
-
由 James Mullenbach 提交于
There's a short period during ParameterServerStrategy initialization / cluster connection in which worker preemptions will lead to UnavailableErrors from CreateContext calls. This adds configurable retries to SetServerDef so that a single connection failure does not stop the whole job. Retries will be enabled as the default behavior for PSS in a followup change. PiperOrigin-RevId: 565214961
-
由 A. Unique TensorFlower 提交于
The loop that runs Autotune will fetch current values for available CPU and RAM on each iteration. This helps in situations where the hardware resources available to tf.data may be vertically scaled up or down based on usage during the process' lifetime. PiperOrigin-RevId: 565197940
-
由 Fergus Henderson 提交于
PiperOrigin-RevId: 565191065
-
由 Yu Feng 提交于
PiperOrigin-RevId: 565189870
-
由 Clive Verghese 提交于
PiperOrigin-RevId: 565187417
-
由 Fergus Henderson 提交于
to construct the TFLiteSettings FlatBuffer. PiperOrigin-RevId: 565184361
-
由 Son Tuan Vu 提交于
Vectorized column reductions might exceed shmem budget. Limit the unroll factors to avoid this. PiperOrigin-RevId: 565170403
-
由 TensorFlower Gardener 提交于
PiperOrigin-RevId: 565170354
-
由 Swachhand Lokhande 提交于
The owned PjRtBuffers in `owned_executable_args` need to live until execution is complete. Currently this is achieved by blocking until all the executable outputs are ready. However, this seemed to cause performance overheads, see b/299683272 and b/300102691. With this change, we don't block until execution is complete. The ownership of `owned_executable_args` is moved to a lambda which is executed as a callback when the PjRtFuture returned by ExecutePortable is ready (which happens when the execution is complete). PiperOrigin-RevId: 565169152
-
由 Hye Soo Yang 提交于
PiperOrigin-RevId: 565168980
-
由 Yu Feng 提交于
Such that users can act on the classes, adding the override methods there. PiperOrigin-RevId: 565168942
-
由 A. Unique TensorFlower 提交于
[XLA] Add WithReplicaGroups in pattern matcher and modify tests to conform to the new pattern matching format -Add WithReplicaGroups implementation for HloInstructionPattern to match with the collective instruction's replica groups. PiperOrigin-RevId: 565160403
-
由 Hye Soo Yang 提交于
PiperOrigin-RevId: 565158613
-
由 Yishuang Pang 提交于
PiperOrigin-RevId: 565157773
-
由 Hye Soo Yang 提交于
Open source sparse_core_ops_utils* PiperOrigin-RevId: 565156105
-
由 Son Tuan Vu 提交于
PiperOrigin-RevId: 565152217
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 565140913
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 565134913
-
由 Matt Callanan 提交于
PiperOrigin-RevId: 565132363
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 565121864
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 565119660
-
由 Grant Jensen 提交于
PiperOrigin-RevId: 565113938
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 565111478
-
由 TensorFlower Gardener 提交于
PiperOrigin-RevId: 565106502
-
由 Hye Soo Yang 提交于
PiperOrigin-RevId: 565097655
-
由 TensorFlower Gardener 提交于
PiperOrigin-RevId: 565094585
-
由 Jieying Luo 提交于
PiperOrigin-RevId: 565092976
-
由 Marcello Maggioni 提交于
[XLA] Rework dot() sharding propagation to lookahead instructions sharding to choose a sharding for dot() that agrees with the users if possible. PiperOrigin-RevId: 565086052
-
由 Oleg Shyshkov 提交于
PiperOrigin-RevId: 565083641
-
由 Marcello Maggioni 提交于
Small collectives might be better off when sinked and there are other potnential use cases Also fix a bug, where we were accepting reuse of the data that we were storing and changing the tests using that pattern to match the fix. PiperOrigin-RevId: 565080772
-
由 Bixia Zheng 提交于
instructions through control dependence. This is because the generated HLO program is correct even without the control dependence chaining. The purpose of the control dependence chaining is to support a scheduler, such as the latency hiding scheduler, and thus will be added to the latency hiding scheduler preparation pass. Not producing the control dependence chaining while decomposing collective-permute can also simplify the implementation of collective-pipeliner in pipelining Send and Recv instructions. PiperOrigin-RevId: 565073772
-
由 Benjamin Kramer 提交于
Updates LLVM usage to match [8ebe1d1cc1e4](https://github.com/llvm/llvm-project/commit/8ebe1d1cc1e4) PiperOrigin-RevId: 565069541
-
- 13 9月, 2023 6 次提交
-
-
由 A. Unique TensorFlower 提交于
Without this additional bazelrc file TSL will be built with the system compiler which happens to be GCC. PiperOrigin-RevId: 565060297
-
由 Peter Hawkins 提交于
This change was merged upstream in https://github.com/openai/triton/pull/2068 but hasn't made it to the OpenXLA fork yet. Improves build time in OSS by not building a number of MLIR dialects/passes that are not needed. PiperOrigin-RevId: 565058458
-
由 Oleg Shyshkov 提交于
PiperOrigin-RevId: 565025664
-
由 George Karpenkov 提交于
PiperOrigin-RevId: 565024582
-
由 Tamás Danyluk 提交于
Unify iterator names to *_it; Use auto* instead of auto if the type is a pointer type. PiperOrigin-RevId: 565016627
-
由 Johannes Reifferscheid 提交于
This is another prefactoring to make Triton fusions compatible with FusionInterface and partially fused HLO. For the former, we need the LaunchDimension computation to be a separate function. For the latter, we change the launch dimension function signatures to no longer take an HloComputation, because we don't yet have that during fusion (at least not a complete one). For now, this change is a no-op, since we do not yet have any boundary functions for non-fusion ops. PiperOrigin-RevId: 565015567
-