- 14 10月, 2022 40 次提交
-
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 481016434
-
由 Mohammadreza Heydary 提交于
This CL introduces a new legalization pattern that rewrites sigmoid_grad operation as a `grad = dy * y * (1 - y)`, where `y = sigmoid(x)`. PiperOrigin-RevId: 481015565
-
由 Anlun Xu 提交于
JitExecutable is not required for AOT compilation, so XlaRuntimeCpuExecutable can own a runtime::Executable directly. PiperOrigin-RevId: 481012743
-
由 Faizan Muhammad 提交于
PiperOrigin-RevId: 481010554
-
由 David Dunleavy 提交于
PiperOrigin-RevId: 481008739
-
由 Anlun Xu 提交于
PiperOrigin-RevId: 481007346
-
由 David Dunleavy 提交于
PiperOrigin-RevId: 481006537
-
由 Faizan Muhammad 提交于
PiperOrigin-RevId: 481006100
-
由 Umer Javed 提交于
PiperOrigin-RevId: 481004699
-
由 Rahul Joshi 提交于
PiperOrigin-RevId: 481001377
-
由 Pankaj Kanwar 提交于
PiperOrigin-RevId: 480996400
-
由 Anlun Xu 提交于
PiperOrigin-RevId: 480996180
-
由 A. Unique TensorFlower 提交于
http://github.com/tensorflow/runtime/commit/36b7795640a99e44a9ab246c5495be5f99736c99. PiperOrigin-RevId: 480985467
-
由 A. Unique TensorFlower 提交于
[lite] When constructing a subgraph, use control dependencies from the model's metadata, if present. PiperOrigin-RevId: 480984280
-
由 Soo Sung 提交于
PiperOrigin-RevId: 480983235
-
由 Faizan Muhammad 提交于
PiperOrigin-RevId: 480977676
-
由 Eugene Zhulenev 提交于
PiperOrigin-RevId: 480974257
-
由 Jeffrey A. Dean 提交于
Use std::function objects rather than creating thousands of separate copies of very large routines in shape_util.h Refactored code to have an internal helper ForEachState struct, and to have separate code for the parallel vs. non-parallel versions of the core ForEachInternal functionality (compiler wasn't smart enough to track the parallel bit through the std::optional object with the ThreadPool, and so was emitting parallel code even for calls with the non-parallel variant, and due to templatization, there were thousands of copies of these routines. Avoid inlining some large routines in literal.h Avoid inlining some automatically generated constructors for ShapeIndex, ProgramShape, etc. in shape.{h,cc} Avoid inlining large routines on non-OK paths in status_macros.h Changes drop generated text size for a large binary by about 3.0 MB (~1.4%). PiperOrigin-RevId: 480973483
-
由 TensorFlower Gardener 提交于
Merge pull request #55780 from ROCmSoftwarePlatform:google_upstream_remove_rocm_build_flag_nextafter_op PiperOrigin-RevId: 480973150
-
由 Rahul Joshi 提交于
- Unify the code for handling operand->tuple sharding propagation when no sharding is present vs refining the existing sharding (and reuse the code to refine existing sharding). - This also fixes an issue with handling empty tuple sub-elements, which are essentially not counted in the top-level tuple elements of the tuple sharding (since the code the refines existing sharding handles this correctly) PiperOrigin-RevId: 480971521
-
由 Luke Boyer 提交于
PiperOrigin-RevId: 480968915
-
由 TensorFlower Gardener 提交于
PiperOrigin-RevId: 480958124
-
由 Rahul Joshi 提交于
- consider collective communication operations with channel_id as side effecting only in non-spmd mode. - Also handle all collective operations in the function. - Change SPMD partitioning test to verify that any collective generated by partitioning does not have sharding. PiperOrigin-RevId: 480953154
-
由 Bruce Fontaine 提交于
PiperOrigin-RevId: 480950049
-
由 Haoyu Zhang 提交于
PiperOrigin-RevId: 480949877
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 480944931
-
由 Robert Suderman 提交于
Existing lowering missed cases where the width/height of the input or output are 1. These cases were difficult to address in the current implementation so they were cleaned up. Then power-of-2 specific code was removed as it was easier to just depend on GCD to do the right thing. PiperOrigin-RevId: 480936509
-
由 Zhi An Ng 提交于
These 2 tests cases are long running, because they test a cartesian product of dtypes * transpose * adjoint * shapes (2 * 4 * 4 * 3 = 96). These 2 test cases are the bottlenecks in the entire test suite finishing. By converting them into parameterized test cases, each of the case in the product becomes its own test case, and can run on different shards. PiperOrigin-RevId: 480935083
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 480932683
-
由 Jenni Kilduff 提交于
This creates a const op filled with [0, 1, 2...iotaSize] values, then tiles it to the iota result shape PiperOrigin-RevId: 480930548
-
由 Jake Harmon 提交于
PiperOrigin-RevId: 480928553
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 480927625
-
由 Anlun Xu 提交于
PiperOrigin-RevId: 480927276
-
由 TensorFlower Gardener 提交于
PiperOrigin-RevId: 480926841
-
由 Zhi An Ng 提交于
PiperOrigin-RevId: 480926009
-
由 Clive Verghese 提交于
PiperOrigin-RevId: 480922000
-
由 A. Unique TensorFlower 提交于
Tiling `linalg.matmul` with any (< 3) number of tile sizes. PiperOrigin-RevId: 480921681
-
由 TensorFlower Gardener 提交于
PiperOrigin-RevId: 480919007
-
由 Emilio Cota 提交于
So that it can be included from header files without bringing in cpu_executable as a dependence. PiperOrigin-RevId: 480917177
-
由 Bernhard Bauer 提交于
The two files typically need to be consumed jointly, as gpu_delegate_native_jni.cc contains the native code needed for (successfully) initializing the native part of GpuDelegate. PiperOrigin-RevId: 480914567
-