- 17 7月, 2019 20 次提交
-
-
由 Benoit Jacob 提交于
Simplify ruy's main loop. Most of the next_* business was unnecessary complication. This code didn't know whether it wanted to hide latency of an atomic increment (60 cycles) or of a block coords computation (comparable). Now it's more intentional about hiding the atomic increment latency, because that's the one instruction here that will always have high latency; for the rest, we can only hope that the compiler will exploit any opportunity to inline the block computation and distribute its instructions so as to hide some of the latency. The more important point is what while we don't really know what would run faster and that will at most make a small impact on latency, on the other hand there is a substantial code simplification here, and this matters because this is very central code. Notice in particular how the block coords computation was written twice, once before the loop body and one at the end of the loop body, and now it's only once anymore. PiperOrigin-RevId: 258429272
-
由 A. Unique TensorFlower 提交于
this is the preparation to merge the internal xprof and external oss version of annotation implementation, I need to make sure the benchmark is comparable or better. PiperOrigin-RevId: 258425457
-
由 Eugene Zhulenev 提交于
PiperOrigin-RevId: 258422018
-
由 Ian Langmore 提交于
this doesn't make sense for matvec (and isn't in the base class matvec defn). PiperOrigin-RevId: 258417922
-
由 Andy Ly 提交于
PiperOrigin-RevId: 258413874
-
由 Smit Hinsu 提交于
PiperOrigin-RevId: 258412619
-
由 Akshay Modi 提交于
It would fail on ndarrays with an numpy.dtype doesn't have is_floating check on the next line in any case, so this gives it a nicer error message. Also use the common RegisterType functionality to check for resource variables. PiperOrigin-RevId: 258407797
-
由 Jiri Simsa 提交于
PiperOrigin-RevId: 258404510
-
由 Priya Gupta 提交于
PiperOrigin-RevId: 258399687
-
由 Nupur Garg 提交于
PiperOrigin-RevId: 258395114
-
由 Feng Liu 提交于
This patch contains various changes to allow the worflow working for both UINT8 and INT8 quantization scheme: - The "restricted_output_params" in the "OpQuantSpec" is changed to a map, so the op definition can define restrictions for both UINT8 and INT8. - INT8 quantization spec is added to the TFLite op definition. - a "quantize_sign" flag is passed into the pre-quantize pass, so the spec and propgation for different sign can be configured by this flag. There are followup patches to read this flag from the user's command line. PiperOrigin-RevId: 258393145
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 258393073
-
由 TensorFlower Gardener 提交于
PiperOrigin-RevId: 258392805
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 258391933
-
由 Alexandre Passos 提交于
PiperOrigin-RevId: 258389587
-
由 Benoit Jacob 提交于
50001. PiperOrigin-RevId: 258389320
-
由 Jiri Simsa 提交于
PiperOrigin-RevId: 258385606
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 258380926
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 258378187
-
由 Sergei Lebedev 提交于
This slightly speeds up the common case of converting shape lists to tensors (when calling an op). Before: >>> %timeit tf.convert_to_tensor([1, 2, 3]) 100000 loops, best of 3: 10.4 ?s per loop >>> %timeit tf.convert_to_tensor(np.array([1, 2, 3])) # For reference. 100000 loops, best of 3: 6.47 ?s per loop After: >>> %timeit tf.convert_to_tensor([1, 2, 3]) 100000 loops, best of 3: 7.23 ?s per loop The remaining 1us is due to the necessary nest.flatten call. It might be optimized by introducing nest.all which does not allocate a flat list. PiperOrigin-RevId: 258375416
-
- 16 7月, 2019 20 次提交
-
-
由 Andrew Audibert 提交于
Dataset elements are no longer limited to nested structures of Tensors. This change updates the docs to refer to "dataset elements" instead of nested structures of Tensors. We may want to also deprecate from_tensors/from_tensor_slices and replace them with better-named from_element/from_element_slices. Since this is more controversial, we can do it as a separate change. PiperOrigin-RevId: 258375042
-
由 TensorFlower Gardener 提交于
PiperOrigin-RevId: 258373033
-
由 Benoit Jacob 提交于
#defines - they were tested directly by #ifdef, and were being defined by path.h. As tune.cc did not #include path.h, it did not enable its platform-specific tuning code, resulting in a performance regression in cases relying on tuning for maximal performance --- in-order ARM. To prevent that from happening again, this moves the platform defines to a new platform.h and forces users to use a RUY_PLATFORM(X) function macro, so that if they fail to #include platform.h, they get a compilation error. PiperOrigin-RevId: 258372624
-
由 TensorFlower Gardener 提交于
PiperOrigin-RevId: 258371737
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 258361644
-
由 Tamara Norman 提交于
Add absl logging functions to the output_streams supported by tf.print (v2). This ensures we can print to these streams without needing to use tf.compat.v1 PiperOrigin-RevId: 258360763
-
由 A. Unique TensorFlower 提交于
For now, the MLIR backend will expect the same HLO as input as the GPU backend does. Hence, we need to run the same required passes. Also use the same HLO level optimizations, so that we get comparable HLO. PiperOrigin-RevId: 258349785
-
由 TensorFlower Gardener 提交于
PiperOrigin-RevId: 258349360
-
由 Alexander Belyaev 提交于
PiperOrigin-RevId: 258349333
-
由 TensorFlower Gardener 提交于
PiperOrigin-RevId: 258349270
-
由 TensorFlower Gardener 提交于
PiperOrigin-RevId: 258349112
-
由 A. Unique TensorFlower 提交于
The skeleton returns unimplemented for all members of the Compiler interface. It can only be used when combined with the FailoverCompiler and the existing GPU backend. This is also the configuration that is registered. PiperOrigin-RevId: 258333034
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 258324385
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 258324381
-
由 Chao Mei 提交于
PiperOrigin-RevId: 258317482
-
由 Saurabh Saxena 提交于
PiperOrigin-RevId: 258313758
-
由 Feng Liu 提交于
When the graph debug info is generated by TensorFlow, the node stack trace is indexed by the string concatenation of function name and node name, so we have to use the same string to lookup the stack trace for the nodes. E2E tests for displaying stack traces in the error messages are added to verify the stack traces for nodes from both graphs and functions can be displayed correctly. PiperOrigin-RevId: 258311493
-
由 Haoliang Zhang 提交于
PiperOrigin-RevId: 258304540
-
由 Derek Murray 提交于
The registry is keyed by an enum and contains a static number of keys (based on an enum), so this avoids a map lookup at the start of every eager op execution. PiperOrigin-RevId: 258296647
-
由 Derek Murray 提交于
With some combinations of compiler options (though surprisingly not just enabling ASAN), taking the address of the first member through a null `std::shared_ptr<FunctionDefAndOpRegistration>` fails. PiperOrigin-RevId: 258295758
-