- 15 3月, 2017 26 次提交
-
-
由 Eugene Brevdo 提交于
Change: 150130750
-
由 Yuefeng Zhou 提交于
Substract back temp memory for reduction op because its temp memory becomes output memory. Change: 150130275
-
由 A. Unique TensorFlower 提交于
Change: 150128412
-
由 Peter Hawkins 提交于
[TF:XLA] Implement ResourceApplyAdagrad. Split XLA implementation of training ops into their own file. Change: 150125044
-
由 A. Unique TensorFlower 提交于
Change: 150122507
-
由 A. Unique TensorFlower 提交于
Change: 150117973
-
由 A. Unique TensorFlower 提交于
The original implementation was modeled after TensorFlow's existing ApplyAdamNonCuda and ExponentialMovingAverage classes, which are designed for concurrent lockless updates. However, my model training jobs kept failing with NaN errors when I ran them with more than 20 workers. Upon closer investigation, I discovered that the moving average implementation had two problems: 1) In some cases, the moving average of a sequence of positive numbers can actually be negative. If we take the square root of the moving average, we end up with NaN errors. This happened to me with the moving average variable 'v' in the Adam algorithm. 2) When the moving average decay rates \beta_1 and \beta_2 are significantly less than one and the number of workers is large, the moving averages can become unstable, receiving huge updates from each worker. These instabilities become larger and larger until they cause the model training jobs to crash with NaN errors. Change: 150115814
-
由 A. Unique TensorFlower 提交于
Change: 150112019
-
由 Mustafa Ispir 提交于
Added integration test of Estimator. It shows the complete flow: train, evaluate, predict, and export. It can be used as a documentation/reference. Change: 150111621
-
由 Peter Hawkins 提交于
Change the code that builds zero slots in slot_creator.py to use a zeros_initializer rather than Tensor, to simplify Python code that works with multiple graphs. Implement ResourceApplyMomentum in the XLA bridge. Change: 150111106
-
由 A. Unique TensorFlower 提交于
community/welcome.md page. Revised the content about where to ask questions, report issues, and so on. * Re-added BibTeX entry to bib.md * Created an attribution.md page to hold legal info. * Created landing pages at community/index.md and about/index.md to following same format as landing pages for the rest of the site. * Updated community/leftnav_files and about/leftnav_files to reflect new files. * Moved info about the model zoo to about/uses.md page. Change: 150102687
-
由 A. Unique TensorFlower 提交于
that allows users' models to fall back to run CPUs when there is no GPU implementation for some particular ops. Change: 150101066
-
由 A. Unique TensorFlower 提交于
Need to investigate some fork/exec related bugs and this is not critical functionality. Change: 150099703
-
由 A. Unique TensorFlower 提交于
Change: 150099362
-
由 A. Unique TensorFlower 提交于
Change: 150097312
-
由 Adam Roberts 提交于
Change: 150095694
-
由 A. Unique TensorFlower 提交于
Silence exception thrown by graph explorer when an op lacks a value for the "_output_shapes" attribute. This is in line with how the graph explorer shows nothing about the shape if the shape cannot be determined. Also, fix awkward indentation of 1. Change: 150095236
-
由 A. Unique TensorFlower 提交于
Change: 150094096
-
由 A. Unique TensorFlower 提交于
Change: 150085723
-
由 Mustafa Ispir 提交于
Change: 150085654
-
由 Derek Murray 提交于
Change: 150082087
-
由 Justin Lebar 提交于
To make this work, we have to add a few IgnoreError() calls to TensorFlow. Change: 150080076
-
由 Peter Hawkins 提交于
Change: 150078943
-
由 A. Unique TensorFlower 提交于
- for each x value, cache the indexes and the 'advance 'value. - access input and output through direct pointer access instead of through eigen_tensor(b,y,x,c). - special-case the 3 channel case. - switch channel/width loops in the general case so that a single float[4] can be used for the cache. After caching 'advance' value, the values used during iteration could be converted to plain float[4] instead of using the CachedInterpolation object. Removed the special cases in CachedInterpolation::Advance; the special cases for speed are not needed when it's only called once per image. Added more test cases and benchmark cases. Change: 150077397
-
由 Justin Lebar 提交于
Change: 150075644
-
由 A. Unique TensorFlower 提交于
Change: 150073043
-
- 14 3月, 2017 14 次提交
-
-
由 Shanqing Cai 提交于
Change: 150070251
-
由 A. Unique TensorFlower 提交于
Change: 150054942
-
由 A. Unique TensorFlower 提交于
Change: 150043496
-
由 A. Unique TensorFlower 提交于
Tensorboard samples steps, yet users desire health pills at specific steps. This change makes the debugger plugin read directly from disk when the user specifies a specific step. This is much slower (It could take minutes.) than the alternative path of querying the multiplexer for sampled health pills. Change: 150041439
-
由 Suharsh Sivakumar 提交于
Change: 150038629
-
由 Patrick Nguyen 提交于
Change: 150028654
-
由 Alexey Surkov 提交于
1) for resumable uploads, log the last causing failure in the final AbortedError 2) log all retry attempts Logging example from the unittest: W0309 18:00:39.706445 687030 retrying_utils.cc:67] The operation failed and will be automatically retried in 0.995024 seconds (attempt 1 out of 10), caused by: Unavailable: Failed. W0309 18:00:39.706562 687030 retrying_utils.cc:67] The operation failed and will be automatically retried in 1.39374 seconds (attempt 2 out of 10), caused by: Unavailable: Failed. W0309 18:00:39.706605 687030 retrying_utils.cc:67] The operation failed and will be automatically retried in 2.63587 seconds (attempt 3 out of 10), caused by: Unavailable: Failed. W0309 18:00:39.706652 687030 retrying_utils.cc:67] The operation failed and will be automatically retried in 4.30139 seconds (attempt 4 out of 10), caused by: Unavailable: Failed. W0309 18:00:39.706693 687030 retrying_utils.cc:67] The operation failed and will be automatically retried in 8.32369 seconds (attempt 5 out of 10), caused by: Unavailable: Failed. W0309 18:00:39.706751 687030 retrying_utils.cc:67] The operation failed and will be automatically retried in 16.4618 seconds (attempt 6 out of 10), caused by: Unavailable: Failed. W0309 18:00:39.706820 687030 retrying_utils.cc:67] The operation failed and will be automatically retried in 32.2621 seconds (attempt 7 out of 10), caused by: Unavailable: Failed. W0309 18:00:39.706867 687030 retrying_utils.cc:67] The operation failed and will be automatically retried in 32.2265 seconds (attempt 8 out of 10), caused by: Unavailable: Failed. W0309 18:00:39.706903 687030 retrying_utils.cc:67] The operation failed and will be automatically retried in 32.8241 seconds (attempt 9 out of 10), caused by: Unavailable: Failed. W0309 18:00:39.706939 687030 retrying_utils.cc:67] The operation failed and will be automatically retried in 32.4939 seconds (attempt 10 out of 10), caused by: Unavailable: Failed. Change: 150027572
-
由 A. Unique TensorFlower 提交于
Change: 150026280
-
由 A. Unique TensorFlower 提交于
Change: 150026059
-
由 A. Unique TensorFlower 提交于
Change: 150016997
-
由 Bjarke Hammersholt Roune 提交于
Change: 150014720
-
由 A. Unique TensorFlower 提交于
Change: 150011637
-
由 A. Unique TensorFlower 提交于
With this you may parse variable-length feature of the Example into a padded Tensor. Change: 150009250
-
由 Ian Langmore 提交于
These should have been done as part of an earlier change, but was missed. Using name_scope means self.name = the_initialization_name + "/", which makes it hard to set a distribution name, then know what the name is later. Change: 150008411
-