提交 · ee131bf3e2c798c69df1a12bb346a3899009a7d1 · xxadev / tensorflow

17 7月, 2019 20 次提交

Simplify ruy's main loop. Most of the next_* business was unnecessary... · ee131bf3

由 Benoit Jacob 提交于 7月 16, 2019

Simplify ruy's main loop. Most of the next_* business was unnecessary complication. This code didn't know whether it wanted to hide latency of an atomic increment (60 cycles) or of a block coords computation (comparable). Now it's more intentional about hiding the atomic increment latency, because that's the one instruction here that will always have high latency; for the rest, we can only hope that the compiler will exploit any opportunity to inline the block computation and distribute its instructions so as to hide some of the latency.

The more important point is what while we don't really know what would run faster and that will at most make a small impact on latency, on the other hand there is a substantial code simplification here, and this matters because this is very central code. Notice in particular how the block coords computation was written twice, once before the loop body and one at the end of the loop body, and now it's only once anymore.

PiperOrigin-RevId: 258429272

ee131bf3

[tracing] add a benchmark test for scoped annotation. · dc9814a9

由 A. Unique TensorFlower 提交于 7月 16, 2019

this is the preparation to merge the internal xprof and external oss version of annotation implementation, I need to make sure the benchmark is comparable or better.

PiperOrigin-RevId: 258425457

dc9814a9

E
Do not force PartitionedCall DT_RESOURCE outputs to be on CPU device · f0fd2bed
由 Eugene Zhulenev 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258422018
```
f0fd2bed
I
BUGFIX: LinearOperatorAdjoint.matvec was passing an `adjoint_arg` kwarg, but · 5819055a
由 Ian Langmore 提交于 7月 16, 2019
```
this doesn't make sense for matvec (and isn't in the base class matvec defn).
PiperOrigin-RevId: 258417922
```
5819055a
A
Automated rollback of commit dbd23da8 · 26d27c5e
由 Andy Ly 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258413874
```
26d27c5e
S
Mark input and output shapes attribute as deprecated in TRTEngineOp · 09f7d539
由 Smit Hinsu 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258412619
```
09f7d539

Nicer error message when watching things other than tensors/resource vars. · da3a241a

由 Akshay Modi 提交于 7月 16, 2019

It would fail on ndarrays with an numpy.dtype doesn't have is_floating check on the next line in any case, so this gives it a nicer error message.

Also use the common RegisterType functionality to check for resource variables.

PiperOrigin-RevId: 258407797

da3a241a

J
[tf.data] Re-enables forward compatibility test. · 2bb4f008
由 Jiri Simsa 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258404510
```
2bb4f008
P
Update docs of MultiWorkerMirroredStrategy (and some related docs in base class) · 31cc24f8
由 Priya Gupta 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258399687
```
31cc24f8
N
Add support for freezing non-lowered While ops in 2.0. · dc72738d
由 Nupur Garg 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258395114
```
dc72738d

Support signed and unsigned quantization types · c99a2b76

由 Feng Liu 提交于 7月 16, 2019

This patch contains various changes to allow the worflow working for both UINT8
and INT8 quantization scheme:

- The "restricted_output_params" in the "OpQuantSpec" is changed to a map, so
  the op definition can define restrictions for both UINT8 and INT8.

- INT8 quantization spec is added to the TFLite op definition.

- a "quantize_sign" flag is passed into the pre-quantize pass, so the spec and
  propgation for different sign can be configured by this flag.

There are followup patches to read this flag from the user's command line.

PiperOrigin-RevId: 258393145

c99a2b76

A
Updating scatter semantics to include an invariant. · a4c01b23
由 A. Unique TensorFlower 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258393073
```
a4c01b23
T
Merge pull request #30681 from BBQuercus:patch-2 · 781535f3
由 TensorFlower Gardener 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258392805
```
781535f3
A
Make sum_regularizer robust against None. · 738615cf
由 A. Unique TensorFlower 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258391933
```
738615cf
A
Tags some tests. · e0e72724
由 Alexandre Passos 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258389587
```
e0e72724
B
Fix UBSan error in test that was testing unrealistic accumulation depth · 6841ac2a
由 Benoit Jacob 提交于 7月 16, 2019
```
50001.

PiperOrigin-RevId: 258389320
```
6841ac2a
J
[tf.data] Reflect cl/256436076 in tf.data rewrites. · 369a0477
由 Jiri Simsa 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258385606
```
369a0477
A
Compatibility compile fix · d2233029
由 A. Unique TensorFlower 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258380926
```
d2233029
A
Add note to benchmark tool on NNAPI restrictions on /data/local/tmp/ on some Android P devices. · 307abb30
由 A. Unique TensorFlower 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258378187
```
307abb30

Do not attempt to autopack sequences of scalars · 729c1dfa

由 Sergei Lebedev 提交于 7月 16, 2019

This slightly speeds up the common case of converting shape lists to
tensors (when calling an op).

Before:

>>> %timeit tf.convert_to_tensor([1, 2, 3])
100000 loops, best of 3: 10.4 ?s per loop
>>> %timeit tf.convert_to_tensor(np.array([1, 2, 3]))  # For reference.
100000 loops, best of 3: 6.47 ?s per loop

After:

>>> %timeit tf.convert_to_tensor([1, 2, 3])
100000 loops, best of 3: 7.23 ?s per loop

The remaining 1us is due to the necessary nest.flatten call. It might
be optimized by introducing nest.all which does not allocate a flat
list.

PiperOrigin-RevId: 258375416

729c1dfa

16 7月, 2019 20 次提交

Update docs for general element handling. · 06d0ca49

由 Andrew Audibert 提交于 7月 16, 2019

Dataset elements are no longer limited to nested structures of Tensors. This change updates the docs to refer to "dataset elements" instead of nested structures of Tensors.

We may want to also deprecate from_tensors/from_tensor_slices and replace them with better-named from_element/from_element_slices. Since this is more controversial, we can do it as a separate change.

PiperOrigin-RevId: 258375042

06d0ca49

T
Merge pull request #30683 from RonLek:tf.Print_render · 021eb7ab
由 TensorFlower Gardener 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258373033
```
021eb7ab

Fix performance regression (b/137615815) introduced by new platform · 8c1064d9

由 Benoit Jacob 提交于 7月 16, 2019

#defines - they were tested directly by #ifdef, and were being defined
by path.h.  As tune.cc did not #include path.h, it did not enable its
platform-specific tuning code, resulting in a performance regression
in cases relying on tuning for maximal performance --- in-order ARM.

To prevent that from happening again, this moves the platform defines
to a new platform.h and forces users to use a RUY_PLATFORM(X) function
macro, so that if they fail to #include platform.h, they get a compilation
error.

PiperOrigin-RevId: 258372624

8c1064d9

T
Merge pull request #30704 from siju-samuel:depr_removed_contrib_gan_examples · f2df2c28
由 TensorFlower Gardener 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258371737
```
f2df2c28
A
Configure LLVMDialect and extract pointer size. · 7e8efb0f
由 A. Unique TensorFlower 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258361644
```
7e8efb0f

Add absl logging functions to the output_streams supported by tf.print (v2).... · b542d679

由 Tamara Norman 提交于 7月 16, 2019

Add absl logging functions to the output_streams supported by tf.print (v2). This ensures we can print to these streams without needing to use tf.compat.v1

PiperOrigin-RevId: 258360763

b542d679

Make XLA GPU and XLA MLIR GPU emitters share the same HLO optimizations passes. · 2ba24e8d

由 A. Unique TensorFlower 提交于 7月 16, 2019

For now, the MLIR backend will expect the same HLO as input as the GPU backend
does. Hence, we need to run the same required passes. Also use the same HLO
level optimizations, so that we get comparable HLO.

PiperOrigin-RevId: 258349785

2ba24e8d

T
Merge pull request #30618 from ROCmSoftwarePlatform:google_upstream_stateful_random_ops · 66b97d20
由 TensorFlower Gardener 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258349360
```
66b97d20
A
[XLA:GPU] Skip multi-output fusion in IsProducerConsumerMultiOutputFusible(). · e1f2843e
由 Alexander Belyaev 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258349333
```
e1f2843e
T
Merge pull request #30614 from ROCmSoftwarePlatform:google_upstream_einsum_op · 104a469d
由 TensorFlower Gardener 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258349270
```
104a469d
T
Merge pull request #30613 from ROCmSoftwarePlatform:google_upstream_avgpooling_op_part_two · 69fdaa00
由 TensorFlower Gardener 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258349112
```
69fdaa00

Add a skeleton for the MLIR GPU backend. · 17343b2f

由 A. Unique TensorFlower 提交于 7月 16, 2019

The skeleton returns unimplemented for all members of the Compiler
interface. It can only be used when combined with the FailoverCompiler
and the existing GPU backend. This is also the configuration that is
registered.

PiperOrigin-RevId: 258333034

17343b2f

A
compat: Update forward compatibility horizon to 2019-07-16 · 205e1435
由 A. Unique TensorFlower 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258324385
```
205e1435
A
Update GraphDef version to 98. · cd1960a6
由 A. Unique TensorFlower 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258324381
```
cd1960a6
C
Make it flexible to choose the actual TFLite op resolver in the benchmark tool. · 49cd2732
由 Chao Mei 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258317482
```
49cd2732
S
Fix for tf.function in v1 control flow. · 2a182a80
由 Saurabh Saxena 提交于 7月 16, 2019
```
PiperOrigin-RevId: 258313758
```
2a182a80

Use the right lookup key when the graph debug info is imported and E2E tests · 4cb97078

由 Feng Liu 提交于 7月 16, 2019

When the graph debug info is generated by TensorFlow, the node stack trace is
indexed by the string concatenation of function name and node name, so we have
to use the same string to lookup the stack trace for the nodes.

E2E tests for displaying stack traces in the error messages are added to verify
the stack traces for nodes from both graphs and functions can be displayed
correctly.

PiperOrigin-RevId: 258311493

4cb97078

H
Add Bool support in transpose op. · b51a1b25
由 Haoliang Zhang 提交于 7月 15, 2019
```
PiperOrigin-RevId: 258304540
```
b51a1b25

[Eager] Change rewrite registry from a (singleton) std::map to an std::array. · c1f7f64b

由 Derek Murray 提交于 7月 15, 2019

The registry is keyed by an enum and contains a static number of keys (based on an enum), so this avoids a map lookup at the start of every eager op execution.

PiperOrigin-RevId: 258296647

c1f7f64b

Fix undefined behavior in FunctionLibraryDefinition::Find(). · b3d4737e

由 Derek Murray 提交于 7月 15, 2019

With some combinations of compiler options (though surprisingly not just enabling ASAN), taking the address of the first member through a null `std::shared_ptr<FunctionDefAndOpRegistration>` fails.

PiperOrigin-RevId: 258295758

b3d4737e

xxadev / tensorflow 与 Fork 源项目一致

xxadev / tensorflow
与 Fork 源项目一致