- 17 8月, 2017 24 次提交
-
-
由 Pete Warden 提交于
PiperOrigin-RevId: 165504820
-
由 A. Unique TensorFlower 提交于
tensorflow::grpc::EncodeRecvTensorResponseToByteBuffer() and tensorflow::grpc::EncodeTensorToByteBuffer() The Swap() operation allows the assignment to the *result ByteBuffer to be performed without the allocation performed by grpc::ByteBuffer::operator=(). PiperOrigin-RevId: 165504787
-
由 A. Unique TensorFlower 提交于
Unlike doing a direct assignment, set_hparam will perform the same sort of type checking that parse/parse_json would do. PiperOrigin-RevId: 165504169
-
由 Jiri Simsa 提交于
PiperOrigin-RevId: 165500724
-
由 Yao Zhang 提交于
PiperOrigin-RevId: 165495580
-
由 Jonathan Hseu 提交于
PiperOrigin-RevId: 165485537
-
由 Alexandre Passos 提交于
PiperOrigin-RevId: 165483637
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 165481485
-
由 Benoit Steiner 提交于
PiperOrigin-RevId: 165476236
-
由 A. Unique TensorFlower 提交于
reasons: * Adds metadata to the memcpy. * Automatically takes care of the case where the element being copied is a scalar. PiperOrigin-RevId: 165474762
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 165470975
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 165469441
-
由 Alexandre Passos 提交于
PiperOrigin-RevId: 165467540
-
由 Mark Daoust 提交于
PiperOrigin-RevId: 165461843
-
由 Yangzihao Wang 提交于
Remove the dependency of cudnn_disable_conv_1x1_optimization flag of filter with the same size as input shape. PiperOrigin-RevId: 165461092
-
由 Chris Leary 提交于
Assigning a layout should not mutate the aliasing properties of the graph. PiperOrigin-RevId: 165460991
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 165460865
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 165457067
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 165455593
-
由 A. Unique TensorFlower 提交于
std::stable_sort is slower than std::sort, so run std::sort and then go back and deal with runs of equal entries. before: CPU: Intel Ivybridge with HyperThreading (20 cores) dL1:32KB dL2:256KB dL3:25MB Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------------------------------- BM_TopK_CPU_1_100_1_16 10472 29186 48042 9.107M items/s topk_r_1_c_100_k_1_th_16 BM_TopK_CPU_1_100_2_16 10860 29423 65483 8.782M items/s topk_r_1_c_100_k_2_th_16 BM_TopK_CPU_1_100_10_16 11604 31041 61883 8.218M items/s topk_r_1_c_100_k_10_th_16 BM_TopK_CPU_1_100_50_16 12823 33145 55596 7.437M items/s topk_r_1_c_100_k_50_th_16 BM_TopK_CPU_1_100_100_16 13980 35452 49324 6.822M items/s topk_r_1_c_100_k_100_th_16 BM_TopK_CPU_32_100_1_16 13227 33997 53782 230.713M items/s topk_r_32_c_100_k_1_th_16 BM_TopK_CPU_32_100_2_16 26811 61550 25983 113.824M items/s topk_r_32_c_100_k_2_th_16 BM_TopK_CPU_32_100_10_16 61673 105223 8326 49.483M items/s topk_r_32_c_100_k_10_th_16 BM_TopK_CPU_32_100_50_16 65507 349948 10000 46.587M items/s topk_r_32_c_100_k_50_th_16 BM_TopK_CPU_32_100_100_16 151183 198478 4602 20.186M items/s topk_r_32_c_100_k_100_th_16 BM_TopK_CPU_128_100_1_16 22387 52298 31619 545.262M items/s topk_r_128_c_100_k_1_th_16 BM_TopK_CPU_128_100_2_16 92960 141800 8176 131.315M items/s topk_r_128_c_100_k_2_th_16 BM_TopK_CPU_128_100_10_16 82928 749170 8219 147.200M items/s topk_r_128_c_100_k_10_th_16 BM_TopK_CPU_128_100_50_16 145420 1639555 4578 83.943M items/s topk_r_128_c_100_k_50_th_16 BM_TopK_CPU_128_100_100_16 106392 745496 6634 114.737M items/s topk_r_128_c_100_k_100_th_16 BM_TopK_CPU_128_1000_1_16 110449 277875 6463 1.079G items/s topk_r_128_c_1000_k_1_th_16 BM_TopK_CPU_128_1000_2_16 88770 864674 7004 1.343G items/s topk_r_128_c_1000_k_2_th_16 BM_TopK_CPU_128_1000_10_16 115540 1391511 5702 1.032G items/s topk_r_128_c_1000_k_10_th_16 BM_TopK_CPU_128_1000_50_16 282633 3840198 2435 431.904M items/s topk_r_128_c_1000_k_50_th_16 BM_TopK_CPU_128_1000_100_16 357064 4882968 1943 341.873M items/s topk_r_128_c_1000_k_100_th_16 BM_TopK_CPU_128_1000_500_16 790974 11271216 848 154.329M items/s topk_r_128_c_1000_k_500_th_16 BM_TopK_CPU_128_1000_1000_16 661784 9318164 1000 184.457M items/s topk_r_128_c_1000_k_1000_th_16 BM_TopK_CPU_16_10000_10000_16 1078471 15056662 648 141.485M items/s topk_nmt_r_16_c_10000_k_10000_th_16 BM_TopK_CPU_16_20000_20000_16 2244454 32823888 300 135.969M items/s topk_nmt_r_16_c_20000_k_20000_th_16 BM_TopK_CPU_16_50000_50000_16 6501366 93873780 100 117.351M items/s topk_nmt_r_16_c_50000_k_50000_th_16 BM_TopK_CPU_16_100000_100000_16 14934618 203006700 48 102.171M items/s topk_nmt_r_16_c_100000_k_100000_th_16 BM_TopK_CPU_16_35000_35000_16 4637517 64356466 159 115.160M items/s topk_nmt_r_16_c_35000_k_35000_th_16 BM_TopK_CPU_16_70000_70000_16 9817851 137707012 72 108.793M items/s topk_nmt_r_16_c_70000_k_70000_th_16 BM_TopK_CPU_16_175000_175000_16 24743808 379850278 23 107.917M items/s topk_nmt_r_16_c_175000_k_175000_th_16 BM_TopK_CPU_16_350000_350000_16 76416261 1056343492 9 69.888M items/s topk_nmt_r_16_c_350000_k_350000_th_16 BM_TopK_CPU_128_10000_10000_16 7575714 110759309 90 161.134M items/s topk_nmt_r_128_c_10000_k_10000_th_16 BM_TopK_CPU_128_20000_20000_16 16608244 245613340 44 147.000M items/s topk_nmt_r_128_c_20000_k_20000_th_16 BM_TopK_CPU_128_50000_50000_16 46355581 687585085 15 131.667M items/s topk_nmt_r_128_c_50000_k_50000_th_16 BM_TopK_CPU_128_100000_100000_16 100402412 1545361856 6 121.581M items/s topk_nmt_r_128_c_100000_k_100000_th_16 BM_TopK_CPU_128_35000_35000_16 31705595 475528520 23 134.754M items/s topk_nmt_r_128_c_35000_k_35000_th_16 BM_TopK_CPU_128_70000_70000_16 71193517 1072309510 9 120.024M items/s topk_nmt_r_128_c_70000_k_70000_th_16 BM_TopK_CPU_128_175000_175000_16 209401218 3182226697 3 102.016M items/s topk_nmt_r_128_c_175000_k_175000_th_16 BM_TopK_CPU_128_350000_350000_16 523319920 8118120696 1 81.641M items/s topk_nmt_r_128_c_350000_k_350000_th_16 after: CPU: Intel Ivybridge with HyperThreading (20 cores) dL1:32KB dL2:256KB dL3:25MB Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------------------------------- BM_TopK_CPU_1_100_1_16 10029 27379 69810 9.509M items/s topk_r_1_c_100_k_1_th_16 BM_TopK_CPU_1_100_2_16 10624 28889 66771 8.977M items/s topk_r_1_c_100_k_2_th_16 BM_TopK_CPU_1_100_10_16 11327 30504 63569 8.419M items/s topk_r_1_c_100_k_10_th_16 BM_TopK_CPU_1_100_50_16 12627 32892 55144 7.553M items/s topk_r_1_c_100_k_50_th_16 BM_TopK_CPU_1_100_100_16 11652 31223 60864 8.184M items/s topk_r_1_c_100_k_100_th_16 BM_TopK_CPU_32_100_1_16 13250 34320 54928 230.319M items/s topk_r_32_c_100_k_1_th_16 BM_TopK_CPU_32_100_2_16 26665 60819 26029 114.450M items/s topk_r_32_c_100_k_2_th_16 BM_TopK_CPU_32_100_10_16 63596 107083 10000 47.987M items/s topk_r_32_c_100_k_10_th_16 BM_TopK_CPU_32_100_50_16 71986 399399 10000 42.394M items/s topk_r_32_c_100_k_50_th_16 BM_TopK_CPU_32_100_100_16 121448 166212 5654 25.128M items/s topk_r_32_c_100_k_100_th_16 BM_TopK_CPU_128_100_1_16 22870 53340 30782 533.753M items/s topk_r_128_c_100_k_1_th_16 BM_TopK_CPU_128_100_2_16 84910 129859 7689 143.765M items/s topk_r_128_c_100_k_2_th_16 BM_TopK_CPU_128_100_10_16 73454 588543 8459 166.186M items/s topk_r_128_c_100_k_10_th_16 BM_TopK_CPU_128_100_50_16 153249 1613522 3995 79.655M items/s topk_r_128_c_100_k_50_th_16 BM_TopK_CPU_128_100_100_16 99301 644868 6158 122.929M items/s topk_r_128_c_100_k_100_th_16 BM_TopK_CPU_128_1000_1_16 118921 303889 5922 1.002G items/s topk_r_128_c_1000_k_1_th_16 BM_TopK_CPU_128_1000_2_16 88127 865589 7286 1.353G items/s topk_r_128_c_1000_k_2_th_16 BM_TopK_CPU_128_1000_10_16 145271 1823537 6753 840.296M items/s topk_r_128_c_1000_k_10_th_16 BM_TopK_CPU_128_1000_50_16 301501 4064909 2411 404.875M items/s topk_r_128_c_1000_k_50_th_16 BM_TopK_CPU_128_1000_100_16 353554 4930598 1946 345.267M items/s topk_r_128_c_1000_k_100_th_16 BM_TopK_CPU_128_1000_500_16 801144 11395721 841 152.370M items/s topk_r_128_c_1000_k_500_th_16 BM_TopK_CPU_128_1000_1000_16 596016 8045916 1000 204.810M items/s topk_r_128_c_1000_k_1000_th_16 BM_TopK_CPU_16_10000_10000_16 995618 13666999 704 153.259M items/s topk_nmt_r_16_c_10000_k_10000_th_16 BM_TopK_CPU_16_20000_20000_16 2185050 29367969 329 139.665M items/s topk_nmt_r_16_c_20000_k_20000_th_16 BM_TopK_CPU_16_50000_50000_16 5335686 70232583 100 142.988M items/s topk_nmt_r_16_c_50000_k_50000_th_16 BM_TopK_CPU_16_100000_100000_16 13261121 182133255 54 115.064M items/s topk_nmt_r_16_c_100000_k_100000_th_16 BM_TopK_CPU_16_35000_35000_16 3804972 46840954 175 140.358M items/s topk_nmt_r_16_c_35000_k_35000_th_16 BM_TopK_CPU_16_70000_70000_16 8645191 114955282 80 123.550M items/s topk_nmt_r_16_c_70000_k_70000_th_16 BM_TopK_CPU_16_175000_175000_16 20246335 283943657 27 131.890M items/s topk_nmt_r_16_c_175000_k_175000_th_16 BM_TopK_CPU_16_350000_350000_16 57789267 731309131 10 92.415M items/s topk_nmt_r_16_c_350000_k_350000_th_16 BM_TopK_CPU_128_10000_10000_16 5954690 89560492 96 204.999M items/s topk_nmt_r_128_c_10000_k_10000_th_16 BM_TopK_CPU_128_20000_20000_16 13325200 198147690 44 183.217M items/s topk_nmt_r_128_c_20000_k_20000_th_16 BM_TopK_CPU_128_50000_50000_16 37807096 572441780 18 161.438M items/s topk_nmt_r_128_c_50000_k_50000_th_16 BM_TopK_CPU_128_100000_100000_16 82091346 1247353170 8 148.701M items/s topk_nmt_r_128_c_100000_k_100000_th_16 BM_TopK_CPU_128_35000_35000_16 23132792 356827606 26 184.693M items/s topk_nmt_r_128_c_35000_k_35000_th_16 BM_TopK_CPU_128_70000_70000_16 57629398 862286173 10 148.274M items/s topk_nmt_r_128_c_70000_k_70000_th_16 BM_TopK_CPU_128_175000_175000_16 150355403 2256456964 4 142.079M items/s topk_nmt_r_128_c_175000_k_175000_th_16 BM_TopK_CPU_128_350000_350000_16 331568068 4988966946 2 128.856M items/s topk_nmt_r_128_c_350000_k_350000_th_16 relative throughput difference (new - old)/old: $ paste /tmp/OLD /tmp/NEW | perl -ne '@r = $_ =~ /([\d\.]+[MG]) it/g; if ($r[0] =~ /G/) { $r[0] = 1000*$r[0] }; if ($r[1] =~ /G/) { $r[1] = 1000*$r[1]}; if (@r) {printf("%s\t\trelative throughput difference: %.2f%%\n", (split(" ",$_))[-1], ($r[1] - $r[0])/$r[0] * 100)}' topk_r_1_c_100_k_1_th_16 relative throughput difference: 4.41% topk_r_1_c_100_k_2_th_16 relative throughput difference: 2.22% topk_r_1_c_100_k_10_th_16 relative throughput difference: 2.45% topk_r_1_c_100_k_50_th_16 relative throughput difference: 1.56% topk_r_1_c_100_k_100_th_16 relative throughput difference: 19.96% topk_r_32_c_100_k_1_th_16 relative throughput difference: -0.17% topk_r_32_c_100_k_2_th_16 relative throughput difference: 0.55% topk_r_32_c_100_k_10_th_16 relative throughput difference: -3.02% topk_r_32_c_100_k_50_th_16 relative throughput difference: -9.00% topk_r_32_c_100_k_100_th_16 relative throughput difference: 24.48% topk_r_128_c_100_k_1_th_16 relative throughput difference: -2.11% topk_r_128_c_100_k_2_th_16 relative throughput difference: 9.48% topk_r_128_c_100_k_10_th_16 relative throughput difference: 12.90% topk_r_128_c_100_k_50_th_16 relative throughput difference: -5.11% topk_r_128_c_100_k_100_th_16 relative throughput difference: 7.14% topk_r_128_c_1000_k_1_th_16 relative throughput difference: -7.14% topk_r_128_c_1000_k_2_th_16 relative throughput difference: 0.74% topk_r_128_c_1000_k_10_th_16 relative throughput difference: -18.58% topk_r_128_c_1000_k_50_th_16 relative throughput difference: -6.26% topk_r_128_c_1000_k_100_th_16 relative throughput difference: 0.99% topk_r_128_c_1000_k_500_th_16 relative throughput difference: -1.27% topk_r_128_c_1000_k_1000_th_16 relative throughput difference: 11.03% topk_nmt_r_16_c_10000_k_10000_th_16 relative throughput difference: 8.32% topk_nmt_r_16_c_20000_k_20000_th_16 relative throughput difference: 2.72% topk_nmt_r_16_c_50000_k_50000_th_16 relative throughput difference: 21.85% topk_nmt_r_16_c_100000_k_100000_th_16 relative throughput difference: 12.62% topk_nmt_r_16_c_35000_k_35000_th_16 relative throughput difference: 21.88% topk_nmt_r_16_c_70000_k_70000_th_16 relative throughput difference: 13.56% topk_nmt_r_16_c_175000_k_175000_th_16 relative throughput difference: 22.21% topk_nmt_r_16_c_350000_k_350000_th_16 relative throughput difference: 32.23% topk_nmt_r_128_c_10000_k_10000_th_16 relative throughput difference: 27.22% topk_nmt_r_128_c_20000_k_20000_th_16 relative throughput difference: 24.64% topk_nmt_r_128_c_50000_k_50000_th_16 relative throughput difference: 22.61% topk_nmt_r_128_c_100000_k_100000_th_16 relative throughput difference: 22.31% topk_nmt_r_128_c_35000_k_35000_th_16 relative throughput difference: 37.06% topk_nmt_r_128_c_70000_k_70000_th_16 relative throughput difference: 23.54% topk_nmt_r_128_c_175000_k_175000_th_16 relative throughput difference: 39.27% topk_nmt_r_128_c_350000_k_350000_th_16 relative throughput difference: 57.83% PiperOrigin-RevId: 165455465
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 165454545
-
由 Jianwei Xie 提交于
PiperOrigin-RevId: 165452154
-
由 A. Unique TensorFlower 提交于
Fix the bug where ExternalOptimizerInterface pops out 'method' option. Make a copy of optimizer_kwargs before 'pop'. PiperOrigin-RevId: 165451145
-
由 A. Unique TensorFlower 提交于
when a checkpoint is created. PiperOrigin-RevId: 165450245
-
- 16 8月, 2017 16 次提交
-
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 165444673
-
由 Mark Daoust 提交于
PiperOrigin-RevId: 165428812
-
由 A. Unique TensorFlower 提交于
needed for the fetch. PiperOrigin-RevId: 165404677
-
由 Chris Leary 提交于
Prevents us from compiling/running a bunch of device computations that are easily elided by using the host to generate the fake data, which de-clutters logs and makes dropping into a debugger easier. PiperOrigin-RevId: 165401567
-
由 A. Unique TensorFlower 提交于
2. Add an API to allow easier profile retrieval. Currently in contrib. PiperOrigin-RevId: 165399640
-
由 Mingxing Tan 提交于
PiperOrigin-RevId: 165399126
-
由 Justine Tunney 提交于
PiperOrigin-RevId: 165397425
-
由 James Qin 提交于
PiperOrigin-RevId: 165389504
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 165389240
-
由 A. Unique TensorFlower 提交于
2. Allow to add multiple RunMetadata for 1 step, e.g. 1 for var initialization, 1 for training. So it has a complete profile. 3. Improve tests a bit. PiperOrigin-RevId: 165385567
-
由 Asim Shankar 提交于
PiperOrigin-RevId: 165385120
-
由 Skye Wanderman-Milne 提交于
This is analogous to TF_Output in the C API and Output in the public C++ API. PiperOrigin-RevId: 165384397
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 165379414
-
由 Skye Wanderman-Milne 提交于
Here's an example of the new generated code: AddN::AddN(const ::tensorflow::Scope& scope, ::tensorflow::InputList inputs) { if (!scope.ok()) return; auto _inputs = ::tensorflow::ops::AsNodeOutList(scope, inputs); if (!scope.ok()) return; ::tensorflow::Node* ret; const auto unique_name = scope.GetUniqueNameForOp("AddN"); auto builder = ::tensorflow::NodeBuilder(unique_name, "AddN") .Input(_inputs) ; scope.UpdateBuilder(&builder); scope.UpdateStatus(builder.Finalize(scope.graph(), &ret)); if (!scope.ok()) return; scope.UpdateStatus(scope.DoShapeInference(ret)); this->sum = Output(ret, 0); } Enabling shape inference unfortunately broke many tests. I fixed some of them, but for others I introduced a Scope::DisabledShapeInferenceScope() static method that returns a scope that doesn't perform shape inference. Eventually we should fix the tests that use this and remove it. PiperOrigin-RevId: 165378429
-
由 Ali Yahya 提交于
PiperOrigin-RevId: 165374434
-
由 A. Unique TensorFlower 提交于
PiperOrigin-RevId: 165373422
-