1. 17 8月, 2017 24 次提交
    • P
      Android demo app for speech recognition · bf2365e7
      Pete Warden 提交于
      PiperOrigin-RevId: 165504820
      bf2365e7
    • A
      Use the new grpc::ByteBuffer::Swap() operation to avoid a memory allocation in · 89617e72
      A. Unique TensorFlower 提交于
      tensorflow::grpc::EncodeRecvTensorResponseToByteBuffer() and
      tensorflow::grpc::EncodeTensorToByteBuffer()
      
      The Swap() operation allows the assignment to the *result ByteBuffer
      to be performed without the allocation performed by grpc::ByteBuffer::operator=().
      
      PiperOrigin-RevId: 165504787
      89617e72
    • A
      Add set_hparam to tf.HParams. · a9018cc1
      A. Unique TensorFlower 提交于
      Unlike doing a direct assignment, set_hparam will perform the same sort of type checking that parse/parse_json would do.
      
      PiperOrigin-RevId: 165504169
      a9018cc1
    • J
      Adding xprof tracing for Dataset API. · 6daa3c9c
      Jiri Simsa 提交于
      PiperOrigin-RevId: 165500724
      6daa3c9c
    • Y
      Support second order gradient for fused batch norm. · 4f3b13cc
      Yao Zhang 提交于
      PiperOrigin-RevId: 165495580
      4f3b13cc
    • J
      Allow setting fields in construction in estimator.RunConfig. · 9f7f481f
      Jonathan Hseu 提交于
      PiperOrigin-RevId: 165485537
      9f7f481f
    • A
      Removes forced function definition for eager gradients. · 1375866d
      Alexandre Passos 提交于
      PiperOrigin-RevId: 165483637
      1375866d
    • A
      3a82208c
    • B
      Added preliminary support for arithmetic simplifications · c2478262
      Benoit Steiner 提交于
      PiperOrigin-RevId: 165476236
      c2478262
    • A
      [CPU] HandleSlice(): Use EmitTransferElements(). This is better for two · a711418d
      A. Unique TensorFlower 提交于
      reasons:
      * Adds metadata to the memcpy.
      * Automatically takes care of the case where the element being copied
      is a scalar.
      
      PiperOrigin-RevId: 165474762
      a711418d
    • A
      Add a function that returns whether an hlo is elementwise binary. · 9fc99811
      A. Unique TensorFlower 提交于
      PiperOrigin-RevId: 165470975
      9fc99811
    • A
      Removed thread pinning code from sparse matrix multiplication op. · f54f6370
      A. Unique TensorFlower 提交于
      PiperOrigin-RevId: 165469441
      f54f6370
    • A
      Initial tests for gradient code. · 4738e700
      Alexandre Passos 提交于
      PiperOrigin-RevId: 165467540
      4738e700
    • M
      Fix note formatting · cf2961dd
      Mark Daoust 提交于
      PiperOrigin-RevId: 165461843
      cf2961dd
    • Y
      Remove the dependency of cudnn_disable_conv_1x1_optimization flag of filter... · 198811c6
      Yangzihao Wang 提交于
      Remove the dependency of cudnn_disable_conv_1x1_optimization flag of filter with the same size as input shape.
      
      PiperOrigin-RevId: 165461092
      198811c6
    • C
      [XLA] Do not recompute points-to analysis in iterations of layout assignment. · 95e07ea9
      Chris Leary 提交于
      Assigning a layout should not mutate the aliasing properties of the graph.
      
      PiperOrigin-RevId: 165460991
      95e07ea9
    • A
      No public change. · 3900df5d
      A. Unique TensorFlower 提交于
      PiperOrigin-RevId: 165460865
      3900df5d
    • A
      Allow Defun to inherit enclosing XLA compilation scope. · 48c48729
      A. Unique TensorFlower 提交于
      PiperOrigin-RevId: 165457067
      48c48729
    • A
      Remove incorrect comment related to `logits_input`. · ea1a9de1
      A. Unique TensorFlower 提交于
      PiperOrigin-RevId: 165455593
      ea1a9de1
    • A
      Speed up topK for the k == num_cols case. · 333b27fd
      A. Unique TensorFlower 提交于
      std::stable_sort is slower than std::sort, so run std::sort and then go back and deal with runs of equal entries.
      
      before:
      
      CPU: Intel Ivybridge with HyperThreading (20 cores) dL1:32KB dL2:256KB dL3:25MB
      Benchmark                          Time(ns)        CPU(ns)     Iterations
      -------------------------------------------------------------------------
      BM_TopK_CPU_1_100_1_16                10472          29186          48042  9.107M items/s topk_r_1_c_100_k_1_th_16
      BM_TopK_CPU_1_100_2_16                10860          29423          65483  8.782M items/s topk_r_1_c_100_k_2_th_16
      BM_TopK_CPU_1_100_10_16               11604          31041          61883  8.218M items/s topk_r_1_c_100_k_10_th_16
      BM_TopK_CPU_1_100_50_16               12823          33145          55596  7.437M items/s topk_r_1_c_100_k_50_th_16
      BM_TopK_CPU_1_100_100_16              13980          35452          49324  6.822M items/s topk_r_1_c_100_k_100_th_16
      BM_TopK_CPU_32_100_1_16               13227          33997          53782  230.713M items/s topk_r_32_c_100_k_1_th_16
      BM_TopK_CPU_32_100_2_16               26811          61550          25983  113.824M items/s topk_r_32_c_100_k_2_th_16
      BM_TopK_CPU_32_100_10_16              61673         105223           8326  49.483M items/s topk_r_32_c_100_k_10_th_16
      BM_TopK_CPU_32_100_50_16              65507         349948          10000  46.587M items/s topk_r_32_c_100_k_50_th_16
      BM_TopK_CPU_32_100_100_16            151183         198478           4602  20.186M items/s topk_r_32_c_100_k_100_th_16
      BM_TopK_CPU_128_100_1_16              22387          52298          31619  545.262M items/s topk_r_128_c_100_k_1_th_16
      BM_TopK_CPU_128_100_2_16              92960         141800           8176  131.315M items/s topk_r_128_c_100_k_2_th_16
      BM_TopK_CPU_128_100_10_16             82928         749170           8219  147.200M items/s topk_r_128_c_100_k_10_th_16
      BM_TopK_CPU_128_100_50_16            145420        1639555           4578  83.943M items/s topk_r_128_c_100_k_50_th_16
      BM_TopK_CPU_128_100_100_16           106392         745496           6634  114.737M items/s topk_r_128_c_100_k_100_th_16
      BM_TopK_CPU_128_1000_1_16            110449         277875           6463  1.079G items/s topk_r_128_c_1000_k_1_th_16
      BM_TopK_CPU_128_1000_2_16             88770         864674           7004  1.343G items/s topk_r_128_c_1000_k_2_th_16
      BM_TopK_CPU_128_1000_10_16           115540        1391511           5702  1.032G items/s topk_r_128_c_1000_k_10_th_16
      BM_TopK_CPU_128_1000_50_16           282633        3840198           2435  431.904M items/s topk_r_128_c_1000_k_50_th_16
      BM_TopK_CPU_128_1000_100_16          357064        4882968           1943  341.873M items/s topk_r_128_c_1000_k_100_th_16
      BM_TopK_CPU_128_1000_500_16          790974       11271216            848  154.329M items/s topk_r_128_c_1000_k_500_th_16
      BM_TopK_CPU_128_1000_1000_16         661784        9318164           1000  184.457M items/s topk_r_128_c_1000_k_1000_th_16
      BM_TopK_CPU_16_10000_10000_16       1078471       15056662            648  141.485M items/s topk_nmt_r_16_c_10000_k_10000_th_16
      BM_TopK_CPU_16_20000_20000_16       2244454       32823888            300  135.969M items/s topk_nmt_r_16_c_20000_k_20000_th_16
      BM_TopK_CPU_16_50000_50000_16       6501366       93873780            100  117.351M items/s topk_nmt_r_16_c_50000_k_50000_th_16
      BM_TopK_CPU_16_100000_100000_16    14934618      203006700             48  102.171M items/s topk_nmt_r_16_c_100000_k_100000_th_16
      BM_TopK_CPU_16_35000_35000_16       4637517       64356466            159  115.160M items/s topk_nmt_r_16_c_35000_k_35000_th_16
      BM_TopK_CPU_16_70000_70000_16       9817851      137707012             72  108.793M items/s topk_nmt_r_16_c_70000_k_70000_th_16
      BM_TopK_CPU_16_175000_175000_16    24743808      379850278             23  107.917M items/s topk_nmt_r_16_c_175000_k_175000_th_16
      BM_TopK_CPU_16_350000_350000_16    76416261     1056343492              9  69.888M items/s topk_nmt_r_16_c_350000_k_350000_th_16
      BM_TopK_CPU_128_10000_10000_16      7575714      110759309             90  161.134M items/s topk_nmt_r_128_c_10000_k_10000_th_16
      BM_TopK_CPU_128_20000_20000_16     16608244      245613340             44  147.000M items/s topk_nmt_r_128_c_20000_k_20000_th_16
      BM_TopK_CPU_128_50000_50000_16     46355581      687585085             15  131.667M items/s topk_nmt_r_128_c_50000_k_50000_th_16
      BM_TopK_CPU_128_100000_100000_16  100402412     1545361856              6  121.581M items/s topk_nmt_r_128_c_100000_k_100000_th_16
      BM_TopK_CPU_128_35000_35000_16     31705595      475528520             23  134.754M items/s topk_nmt_r_128_c_35000_k_35000_th_16
      BM_TopK_CPU_128_70000_70000_16     71193517     1072309510              9  120.024M items/s topk_nmt_r_128_c_70000_k_70000_th_16
      BM_TopK_CPU_128_175000_175000_16  209401218     3182226697              3  102.016M items/s topk_nmt_r_128_c_175000_k_175000_th_16
      BM_TopK_CPU_128_350000_350000_16  523319920     8118120696              1  81.641M items/s topk_nmt_r_128_c_350000_k_350000_th_16
      
      after:
      
      CPU: Intel Ivybridge with HyperThreading (20 cores) dL1:32KB dL2:256KB dL3:25MB
      Benchmark                          Time(ns)        CPU(ns)     Iterations
      -------------------------------------------------------------------------
      BM_TopK_CPU_1_100_1_16                10029          27379          69810  9.509M items/s topk_r_1_c_100_k_1_th_16
      BM_TopK_CPU_1_100_2_16                10624          28889          66771  8.977M items/s topk_r_1_c_100_k_2_th_16
      BM_TopK_CPU_1_100_10_16               11327          30504          63569  8.419M items/s topk_r_1_c_100_k_10_th_16
      BM_TopK_CPU_1_100_50_16               12627          32892          55144  7.553M items/s topk_r_1_c_100_k_50_th_16
      BM_TopK_CPU_1_100_100_16              11652          31223          60864  8.184M items/s topk_r_1_c_100_k_100_th_16
      BM_TopK_CPU_32_100_1_16               13250          34320          54928  230.319M items/s topk_r_32_c_100_k_1_th_16
      BM_TopK_CPU_32_100_2_16               26665          60819          26029  114.450M items/s topk_r_32_c_100_k_2_th_16
      BM_TopK_CPU_32_100_10_16              63596         107083          10000  47.987M items/s topk_r_32_c_100_k_10_th_16
      BM_TopK_CPU_32_100_50_16              71986         399399          10000  42.394M items/s topk_r_32_c_100_k_50_th_16
      BM_TopK_CPU_32_100_100_16            121448         166212           5654  25.128M items/s topk_r_32_c_100_k_100_th_16
      BM_TopK_CPU_128_100_1_16              22870          53340          30782  533.753M items/s topk_r_128_c_100_k_1_th_16
      BM_TopK_CPU_128_100_2_16              84910         129859           7689  143.765M items/s topk_r_128_c_100_k_2_th_16
      BM_TopK_CPU_128_100_10_16             73454         588543           8459  166.186M items/s topk_r_128_c_100_k_10_th_16
      BM_TopK_CPU_128_100_50_16            153249        1613522           3995  79.655M items/s topk_r_128_c_100_k_50_th_16
      BM_TopK_CPU_128_100_100_16            99301         644868           6158  122.929M items/s topk_r_128_c_100_k_100_th_16
      BM_TopK_CPU_128_1000_1_16            118921         303889           5922  1.002G items/s topk_r_128_c_1000_k_1_th_16
      BM_TopK_CPU_128_1000_2_16             88127         865589           7286  1.353G items/s topk_r_128_c_1000_k_2_th_16
      BM_TopK_CPU_128_1000_10_16           145271        1823537           6753  840.296M items/s topk_r_128_c_1000_k_10_th_16
      BM_TopK_CPU_128_1000_50_16           301501        4064909           2411  404.875M items/s topk_r_128_c_1000_k_50_th_16
      BM_TopK_CPU_128_1000_100_16          353554        4930598           1946  345.267M items/s topk_r_128_c_1000_k_100_th_16
      BM_TopK_CPU_128_1000_500_16          801144       11395721            841  152.370M items/s topk_r_128_c_1000_k_500_th_16
      BM_TopK_CPU_128_1000_1000_16         596016        8045916           1000  204.810M items/s topk_r_128_c_1000_k_1000_th_16
      BM_TopK_CPU_16_10000_10000_16        995618       13666999            704  153.259M items/s topk_nmt_r_16_c_10000_k_10000_th_16
      BM_TopK_CPU_16_20000_20000_16       2185050       29367969            329  139.665M items/s topk_nmt_r_16_c_20000_k_20000_th_16
      BM_TopK_CPU_16_50000_50000_16       5335686       70232583            100  142.988M items/s topk_nmt_r_16_c_50000_k_50000_th_16
      BM_TopK_CPU_16_100000_100000_16    13261121      182133255             54  115.064M items/s topk_nmt_r_16_c_100000_k_100000_th_16
      BM_TopK_CPU_16_35000_35000_16       3804972       46840954            175  140.358M items/s topk_nmt_r_16_c_35000_k_35000_th_16
      BM_TopK_CPU_16_70000_70000_16       8645191      114955282             80  123.550M items/s topk_nmt_r_16_c_70000_k_70000_th_16
      BM_TopK_CPU_16_175000_175000_16    20246335      283943657             27  131.890M items/s topk_nmt_r_16_c_175000_k_175000_th_16
      BM_TopK_CPU_16_350000_350000_16    57789267      731309131             10  92.415M items/s topk_nmt_r_16_c_350000_k_350000_th_16
      BM_TopK_CPU_128_10000_10000_16      5954690       89560492             96  204.999M items/s topk_nmt_r_128_c_10000_k_10000_th_16
      BM_TopK_CPU_128_20000_20000_16     13325200      198147690             44  183.217M items/s topk_nmt_r_128_c_20000_k_20000_th_16
      BM_TopK_CPU_128_50000_50000_16     37807096      572441780             18  161.438M items/s topk_nmt_r_128_c_50000_k_50000_th_16
      BM_TopK_CPU_128_100000_100000_16   82091346     1247353170              8  148.701M items/s topk_nmt_r_128_c_100000_k_100000_th_16
      BM_TopK_CPU_128_35000_35000_16     23132792      356827606             26  184.693M items/s topk_nmt_r_128_c_35000_k_35000_th_16
      BM_TopK_CPU_128_70000_70000_16     57629398      862286173             10  148.274M items/s topk_nmt_r_128_c_70000_k_70000_th_16
      BM_TopK_CPU_128_175000_175000_16  150355403     2256456964              4  142.079M items/s topk_nmt_r_128_c_175000_k_175000_th_16
      BM_TopK_CPU_128_350000_350000_16  331568068     4988966946              2  128.856M items/s topk_nmt_r_128_c_350000_k_350000_th_16
      
      relative throughput difference (new - old)/old:
      
      $ paste /tmp/OLD /tmp/NEW | perl -ne '@r = $_ =~ /([\d\.]+[MG]) it/g; if ($r[0] =~ /G/) { $r[0] = 1000*$r[0] }; if ($r[1] =~ /G/) { $r[1] = 1000*$r[1]}; if (@r) {printf("%s\t\trelative throughput difference: %.2f%%\n", (split(" ",$_))[-1], ($r[1] - $r[0])/$r[0] * 100)}'
      
      topk_r_1_c_100_k_1_th_16		relative throughput difference: 4.41%
      topk_r_1_c_100_k_2_th_16		relative throughput difference: 2.22%
      topk_r_1_c_100_k_10_th_16		relative throughput difference: 2.45%
      topk_r_1_c_100_k_50_th_16		relative throughput difference: 1.56%
      topk_r_1_c_100_k_100_th_16		relative throughput difference: 19.96%
      topk_r_32_c_100_k_1_th_16		relative throughput difference: -0.17%
      topk_r_32_c_100_k_2_th_16		relative throughput difference: 0.55%
      topk_r_32_c_100_k_10_th_16		relative throughput difference: -3.02%
      topk_r_32_c_100_k_50_th_16		relative throughput difference: -9.00%
      topk_r_32_c_100_k_100_th_16		relative throughput difference: 24.48%
      topk_r_128_c_100_k_1_th_16		relative throughput difference: -2.11%
      topk_r_128_c_100_k_2_th_16		relative throughput difference: 9.48%
      topk_r_128_c_100_k_10_th_16		relative throughput difference: 12.90%
      topk_r_128_c_100_k_50_th_16		relative throughput difference: -5.11%
      topk_r_128_c_100_k_100_th_16		relative throughput difference: 7.14%
      topk_r_128_c_1000_k_1_th_16		relative throughput difference: -7.14%
      topk_r_128_c_1000_k_2_th_16		relative throughput difference: 0.74%
      topk_r_128_c_1000_k_10_th_16		relative throughput difference: -18.58%
      topk_r_128_c_1000_k_50_th_16		relative throughput difference: -6.26%
      topk_r_128_c_1000_k_100_th_16		relative throughput difference: 0.99%
      topk_r_128_c_1000_k_500_th_16		relative throughput difference: -1.27%
      topk_r_128_c_1000_k_1000_th_16		relative throughput difference: 11.03%
      topk_nmt_r_16_c_10000_k_10000_th_16		relative throughput difference: 8.32%
      topk_nmt_r_16_c_20000_k_20000_th_16		relative throughput difference: 2.72%
      topk_nmt_r_16_c_50000_k_50000_th_16		relative throughput difference: 21.85%
      topk_nmt_r_16_c_100000_k_100000_th_16		relative throughput difference: 12.62%
      topk_nmt_r_16_c_35000_k_35000_th_16		relative throughput difference: 21.88%
      topk_nmt_r_16_c_70000_k_70000_th_16		relative throughput difference: 13.56%
      topk_nmt_r_16_c_175000_k_175000_th_16		relative throughput difference: 22.21%
      topk_nmt_r_16_c_350000_k_350000_th_16		relative throughput difference: 32.23%
      topk_nmt_r_128_c_10000_k_10000_th_16		relative throughput difference: 27.22%
      topk_nmt_r_128_c_20000_k_20000_th_16		relative throughput difference: 24.64%
      topk_nmt_r_128_c_50000_k_50000_th_16		relative throughput difference: 22.61%
      topk_nmt_r_128_c_100000_k_100000_th_16		relative throughput difference: 22.31%
      topk_nmt_r_128_c_35000_k_35000_th_16		relative throughput difference: 37.06%
      topk_nmt_r_128_c_70000_k_70000_th_16		relative throughput difference: 23.54%
      topk_nmt_r_128_c_175000_k_175000_th_16		relative throughput difference: 39.27%
      topk_nmt_r_128_c_350000_k_350000_th_16		relative throughput difference: 57.83%
      
      PiperOrigin-RevId: 165455465
      333b27fd
    • A
      Fix parameter naming in docstring and quote it. · 320e64bc
      A. Unique TensorFlower 提交于
      PiperOrigin-RevId: 165454545
      320e64bc
    • J
      Adds outfeed-based eval support on TpuEstimator. · 683d6682
      Jianwei Xie 提交于
      PiperOrigin-RevId: 165452154
      683d6682
    • A
      Fix the bug where ExternalOptimizerInterface pops out 'method' option. Make... · 955a0523
      A. Unique TensorFlower 提交于
        Fix the bug where ExternalOptimizerInterface pops out 'method' option. Make a copy of optimizer_kwargs before 'pop'.
      
      PiperOrigin-RevId: 165451145
      955a0523
    • A
      Add a CheckpointSaverListener subclass which evaluates and exports a model · 9e6ee10f
      A. Unique TensorFlower 提交于
      when a checkpoint is created.
      
      PiperOrigin-RevId: 165450245
      9e6ee10f
  2. 16 8月, 2017 16 次提交