提交 · 71ff7c847344f3527f2b88532552f1e67d39a9c1 · qq_38905368 / tensorflow

30 3月, 2016 9 次提交

J
Better bounds checking for the segment reduction ops. · 71ff7c84
由 Josh Levenberg 提交于 3月 29, 2016
```
Also fix some warnings about unsafe conversions.
Change: 118519927
```
71ff7c84
A
In assertArrayNear() check that the arrays have the same length. · 4acc2dad
由 A. Unique TensorFlower 提交于 3月 29, 2016
```
Add a test that we actually check for array length (we were not.)
Change: 118518452
```
4acc2dad

Optimized DepthwiseConvBackpropInputOp for CPU. · 9d9ad03c

由 A. Unique TensorFlower 提交于 3月 29, 2016

// OLD
Benchmark                             Time(ns)    CPU(ns) Iterations
--------------------------------------------------------------------
BM_ConvFloatDepthwiseBkInCPU1_conv0  207770233  207338129        100  796.0M items/s 32_112_112_3_8_24_3_3_1_2_cpu1
BM_ConvFloatDepthwiseBkInCPU1_conv1  715403538  713939287        100  616.4M items/s 32_112_112_64_1_64_3_3_1_2_cpu1
BM_ConvFloatDepthwiseBkInCPU1_conv2  357349749  356594057        100  617.0M items/s 32_56_56_128_1_128_3_3_1_2_cpu1
BM_ConvFloatDepthwiseBkInCPU1_conv3  274697435  274160117        100  802.7M items/s 32_56_56_128_1_128_3_3_2_2_cpu1
BM_ConvFloatDepthwiseBkInCPU1_conv4   87072020   86874244        100  633.1M items/s 32_28_28_128_1_128_3_3_1_2_cpu1
BM_ConvFloatDepthwiseBkInCPU1_conv5   87172482   86948501        100  632.4M items/s 32_14_14_512_1_512_3_3_1_2_cpu1
BM_ConvFloatDepthwiseBkInCPU1_conv6   46763611   46620163        100  589.4M items/s 32_7_7_1024_1_1024_3_3_1_2_cpu1

// NEW 1-thread
Benchmark                             Time(ns)    CPU(ns) Iterations
--------------------------------------------------------------------
BM_ConvFloatDepthwiseBkInCPU1_conv0   60173061   59839526        100  2.7G items/s 32_112_112_3_8_24_3_3_1_2_cpu1
BM_ConvFloatDepthwiseBkInCPU1_conv1   99396102   99143542        100  4.3G items/s 32_112_112_64_1_64_3_3_1_2_cpu1
BM_ConvFloatDepthwiseBkInCPU1_conv2   39376616   39226953        100  5.5G items/s 32_56_56_128_1_128_3_3_1_2_cpu1
BM_ConvFloatDepthwiseBkInCPU1_conv3   35987577   35843443        100  6.0G items/s 32_56_56_128_1_128_3_3_2_2_cpu1
BM_ConvFloatDepthwiseBkInCPU1_conv4    9665813    9600518        100  5.6G items/s 32_28_28_128_1_128_3_3_1_2_cpu1
BM_ConvFloatDepthwiseBkInCPU1_conv5   12498989   12427035        100  4.3G items/s 32_14_14_512_1_512_3_3_1_2_cpu1
BM_ConvFloatDepthwiseBkInCPU1_conv6    8459759    8397047        100  3.2G items/s 32_7_7_1024_1_1024_3_3_1_2_cpu1

// NEW 4-threads
Benchmark                             Time(ns)    CPU(ns) Iterations
--------------------------------------------------------------------
BM_ConvFloatDepthwiseBkInCPU4_conv0   30696635  101663830        100  5.3G items/s 32_112_112_3_8_24_3_3_1_2_cpu4
BM_ConvFloatDepthwiseBkInCPU4_conv1   68884630  198616710        100  6.3G items/s 32_112_112_64_1_64_3_3_1_2_cpu4
BM_ConvFloatDepthwiseBkInCPU4_conv2   16948037   50360587        100  12.7G items/s 32_56_56_128_1_128_3_3_1_2_cpu4
BM_ConvFloatDepthwiseBkInCPU4_conv3   15834408   46873689        100  13.6G items/s 32_56_56_128_1_128_3_3_2_2_cpu4
BM_ConvFloatDepthwiseBkInCPU4_conv4    3904734   11659079        167  13.8G items/s 32_28_28_128_1_128_3_3_1_2_cpu4
BM_ConvFloatDepthwiseBkInCPU4_conv5    3482083   12555105        188  15.5G items/s 32_14_14_512_1_512_3_3_1_2_cpu4
BM_ConvFloatDepthwiseBkInCPU4_conv6    2330680    8593020        281  11.5G items/s 32_7_7_1024_1_1024_3_3_1_2_cpu4
Change: 118514706

9d9ad03c

V
Make TensorFlow transpose_op_test only take 2 seconds to run · 62e0d8e1
由 Vijay Vasudevan 提交于 3月 29, 2016
```
instead of 48 (on my machine).
Change: 118512022
```
62e0d8e1
B
Leverage index list to further speedup the computation of the convolution · 5a130818
由 Benoit Steiner 提交于 3月 29, 2016
```
gradients by 1 to 10% depending on the size of the convolution kernel.
Change: 118505660
```
5a130818
F
Add link to the inception model serving tutorial. · 89aab41d
由 Fangwei Li 提交于 3月 29, 2016
```
Change: 118497433
```
89aab41d

First pass at enabling non-square strides for convolutions: · 6a073b39

由 A. Unique TensorFlower 提交于 3月 29, 2016

- Change the stride, and in_stride template arguments of Eigen
  SpatialConvolution, SpatialConvolutionBackwardInput, and
  SpatialConvolutionBackwardKernel to row_stride, col_stride, row_in_stride,
  col_in_stride.

- Change tensorflow kernels to pass the additional stride parameters.

- Rationalize the place where we swap row/col: swap just before calling Eigen.

This just enables the plumbing. Non-square strides are still forbidden in the ops.
Change: 118484322

6a073b39

Parse uploaded pbtxt in chunks to allow for loading of large pbtxt files, this... · 622daf83

由 A. Unique TensorFlower 提交于 3月 29, 2016

Parse uploaded pbtxt in chunks to allow for loading of large pbtxt files, this only affects files uploaded with the file chooser menu not XHR requests.
Change: 118479897

622daf83

Fixing some implicit int64->32 downcast errors. (In most cases, by · dff1e630

由 David G. Andersen 提交于 3月 29, 2016

having the ops explicitly return a failure that they can't handle
overly-large inputs).  Most of these should never affect correct
tf programs until people get a lot more memory in their machines.
Change: 118476613

dff1e630

29 3月, 2016 31 次提交
- A
  Add explicit casts for a few ops that assumed it could implicitly cast to/from · fe60adfb
  由 A. Unique TensorFlower 提交于 3月 29, 2016
```
float and int for all types T. (The assumption doesn't hold for Eigen::half.)

These were all ops I could find in CPU implementation; probably, some remain
for GPU (and some other problems remain before we can turn on Eigen::half for
all of them).
Change: 118464463
```
  fe60adfb
- A
  Simple Timeline support for Open Source using Chrome Trace format. · 7b8cd164
  由 A. Unique TensorFlower 提交于 3月 29, 2016
```
Adds a Timeline class which can convert a StepStats protobuf into
a JSON-formatted trace.  This trace can be loaded into any Chrome
web browser via the chrome://tracing URL.
Change: 118461709
```
  7b8cd164
- A
  Add half support for the first basic ops, namely Cast and Const. · 160ac73d
  由 A. Unique TensorFlower 提交于 3月 29, 2016
```
Change: 118445579
```
  160ac73d
- A
  Add half support for the first basic ops, namely Cast and Const. · b6d66ffd
  由 A. Unique TensorFlower 提交于 3月 29, 2016
```
Change: 118445207
```
  b6d66ffd
- A
  Make the ReluGrad and Relu6Grad functors work with types that are are not · cac50c32
  由 A. Unique TensorFlower 提交于 3月 29, 2016
```
implicitly convertible from bool (like Eigen::half).

Note: This does not enable half support for ReluOp in itself, although it is
an important step of the way.
Change: 118444029
```
  cac50c32
- A
  Pull some common lists of headers and deps out of the common android libraries. · 51c0ce54
  由 A. Unique TensorFlower 提交于 3月 28, 2016
```
Change: 118427688
```
  51c0ce54
- B
  Further improved the performance of the backard pass of the convolution · f4acbbf2
  由 Benoit Steiner 提交于 3月 28, 2016
```
Change: 118414827
```
  f4acbbf2
- B
  Upgraded to the latest version of Eigen · 4ec0823f
  由 Benoit Steiner 提交于 3月 28, 2016
```
Change: 118414762
```
  4ec0823f
- A
  Update generated Python Op docs. · db7b68a2
  由 A. Unique TensorFlower 提交于 3月 28, 2016
```
Change: 118414301
```
  db7b68a2
- Y
  Moved the higher order functions into functional_ops.py, to avoid some... · c7ab36c6
  由 Yuan Yu 提交于 3月 28, 2016
```
Moved the higher order functions into functional_ops.py, to avoid some potential circular dependency.
Change: 118412058
```
  c7ab36c6
- A
  Add more accurate cost estimates for scalar_fmod2_op and scalar_mod2_op. · 1e941981
  由 A. Unique TensorFlower 提交于 3月 28, 2016
```
Change: 118410742
```
  1e941981
- S
  Use the correct target and config inside the while loop for wait_for_session(). · f3737491
  由 Sherry Moore 提交于 3月 28, 2016
```
Change: 118409321
```
  f3737491
- A
  250% GPU speed up of the convolution gradient computation wrt the weights · 1e1beefc
  由 A. Unique TensorFlower 提交于 3月 28, 2016
```
for Eigen.
Change: 118408303
```
  1e1beefc
- D
  Add memory fences to ensure that the reader's deliberate racy use of · 98dc8b9a
  由 David G. Andersen 提交于 3月 28, 2016
```
is_initialized is safe.
Change: 118405243
```
  98dc8b9a
- G
  Make write_other_members understand __all__ · 785fa7a3
  由 Geoffrey Irving 提交于 3月 28, 2016
```
This moves the "Other..." documentation of make_all from tf.image to
tf.contrib.layers, since tf.contrib.layers has no __all__ and thus leaks all
sorts of random things.
Change: 118403996
```
  785fa7a3
- D
  Fix cancellation bug in the gRPC runtime, and reduce unnecessary tag creation. · 6ac44c71
  由 Derek Murray 提交于 3月 28, 2016
```
This change accounts for the fact that the `RequestCancelled` callback
will still be called after a `RequestReceived(ok = false)` callback.

In refactoring this code, this change also adds the ability for
enqueued requests to enable cancellation selectively, which reduces
the allocation count of completion queue tags for client-side
cancellation that serve no useful purpose. Only methods that have a
server-side implementation of cancellation should register a
cancellation tag.
Change: 118403144
```
  6ac44c71
- A
  Update generated Python Op docs. · 6d71a394
  由 A. Unique TensorFlower 提交于 3月 28, 2016
```
Change: 118401805
```
  6d71a394
- A
  Update ops-related pbtxt files. · 54eebe93
  由 A. Unique TensorFlower 提交于 3月 28, 2016
```
Change: 118401706
```
  54eebe93
- E
  Add gather_nd op · 3f92ed68
  由 Eugene Brevdo 提交于 3月 28, 2016
```
Change: 118401267
```
  3f92ed68
- A
  Update generated Python Op docs. · 0f303909
  由 A. Unique TensorFlower 提交于 3月 28, 2016
```
Change: 118397198
```
  0f303909
- A
  Make get_collection() always return a first-level copy of the · 45d069bd
  由 A. Unique TensorFlower 提交于 3月 28, 2016
```
collection list.  The previous behavior was to return the list
itself: If other items were later added to the collection the
list returned initially would show them.

Add get_collection_ref() to return the collection list itself.
Change: 118396574
```
  45d069bd
- J
  Internal change for automatically updating docs. · 3854839b
  由 Josh Levenberg 提交于 3月 28, 2016
```
Change: 118393590
```
  3854839b
- A
  Build android_tensorflow_lib_lite with -Os to reduce the size of... · cd3ea60c
  由 Andrew Harp 提交于 3月 28, 2016
```
Build android_tensorflow_lib_lite with -Os to reduce the size of libandroid_tensorflow_lib.so. Kernel code in android_tensorflow_lib is still built with -O2 for performance reasons.
Change: 118389861
```
  cd3ea60c
- A
  Add alwayslink=1 to Android build rules. This allows native Android binaries to · 4cacbfb6
  由 A. Unique TensorFlower 提交于 3月 28, 2016
```
be built depending on these rules. Otherwise the ops and some other components
are dropped because there are no link-time references to them.

This change has no effect on .so files needed by normal Android NDK Java apps.
Change: 118387009
```
  4cacbfb6
- A
  Change selective registration of op kernels to work off of the class name · e30d80d4
  由 A. Unique TensorFlower 提交于 3月 28, 2016
```
instead of the filename.

Change FindKernelDef to also return the class name, to help a tool that
generates ops_to_register.h to find the set of class names.
Change: 118381472
```
  e30d80d4
- J
  Fix the example in sync_replicas_optimizer. · 69eaa5d7
  由 Jianmin Chen 提交于 3月 28, 2016
```
Change: 118376780
```
  69eaa5d7
- A
  Nullify android_tensorflow_lib srcs if not building for Android to prevent misleading errors. · 844bd17c
  由 Andrew Harp 提交于 3月 28, 2016
```
Change: 118376252
```
  844bd17c
- D
  Move tf-graph-params parameters to render.ts. This removes typing/comments/code repetition. · 15c74be4
  由 Dan Smilkov 提交于 3月 28, 2016
```
Change: 118375255
```
  15c74be4
- D
  Bring back fast_cpp_protos bazel configuration and remove 64MB limit of... · e57fc75f
  由 Dan Smilkov 提交于 3月 28, 2016
```
Bring back fast_cpp_protos bazel configuration and remove 64MB limit of protobufs. This speeds-up graph serialization by ~15x for users building TensorFlow from source.

Note that you can now install a faster pip binary for protobuf using the instructions
in https://github.com/tensorflow/tensorflow/commit/8ac009728db931ef3119a337bd23250c89bc7efe

This only affects building and running from within the bazel environment.
Change: 118374862
```
  e57fc75f
- J
  Update generated Python Op docs. · d95fa1bb
  由 Josh Levenberg 提交于 3月 28, 2016
```
Change: 118369028
```
  d95fa1bb
- E
  Add dynamic_pad parameter to tf.batch and tf.batch_join. If true, · e745b2b0
  由 Eugene Brevdo 提交于 3月 28, 2016
```
allows variable sized inputs and pads upon dequeue.  Added unit tests.

During testing, identified a small C++ bug where TensorMap slice fails if
the slice shape has a zero length somewhere.  For now, avoid this error by
checking the slice's size and returning early if the slice is empty
(this is the correct thing to do).
Change: 118362981
```
  e745b2b0

qq_38905368 / tensorflow 与 Fork 源项目一致

qq_38905368 / tensorflow
与 Fork 源项目一致