1. 25 9月, 2019 9 次提交
    • W
      add support tensor and tensorlist for strided_slice OP (#19929) · 382d099d
      wangchaochaohu 提交于
      * add support tensor and tensorlist for strided_slice OP test=develop
      
      * fix the commnet test=develop
      
      * fix test=develop
      
      * fix the bug test=develop
      
      * delete log test=develop
      
      * fix API.spec test=develop
      
      * fix test=develop
      382d099d
    • L
      Fix ssdloss num and batch norm format and conv2d (#19754) · fe218df3
      lvmengsi 提交于
      * update API.spec
      fe218df3
    • L
      Fix OpTest of bn (#19062) · 619a241b
      lvmengsi 提交于
      * fix bn
      619a241b
    • B
      add support of matmul with multiple head even different width and height (#19708) · c670058a
      Bob Zhu 提交于
      * add support of matmul with multiple head even different width and height
      
      Original matmul with multiple head supports only the mat_a.width == mat_b.height,
      in that case, mat_b will be horizontally split. In this patch, we extend the
      support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
      in this case, mab_b will be vertically split.
      
      One example is A is [3, 8], B is [2, 16], head_number is 4. In this
      case, A will be split as [3, 2], B will be (vertically) split as
      [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]
      
      test=develop
      
      * add support of matmul with multiple head even different width and height
      
      Original matmul with multiple head supports only the mat_a.width == mat_b.height,
      in that case, mat_b will be horizontally split. In this patch, we extend the
      support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
      in this case, mab_b will be vertically split.
      
      One example is A is [3, 8], B is [2, 16], head_number is 4. In this
      case, A will be split as [3, 2], B will be (vertically) split as
      [2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]
      
      test=develop
      
      * refactor the code of matmul with multiple head even different width and height
      
      test=develop
      c670058a
    • L
      refine ctc align op with padding (#19926) · 6884dc80
      Liufang Sang 提交于
      * refine ctc align op with padding 
      * refine api sample code
      6884dc80
    • Z
      FIx C++ inference BUG: When open memory optim and enable trt subgraph at the... · e89b1288
      Zhaolong Xing 提交于
      FIx C++ inference BUG: When open memory optim and enable trt subgraph at the same time, there is a bug (#19969)
      
      * fix memory optimization type
      test=develop
      
      * 1. fix BUG: open trt and memory optim will trigger bug.
      2. Clean memory optim bug.
      test=develop
      e89b1288
    • W
      Add support for new QAT models (#18970) · 4286a627
      Wojciech Uss 提交于
      * Add support for new QAT models
      
      test=develop
      Co-Authored-By: NMichał Gallus <michal.gallus@intel.com>
      Co-Authored-By: NWojciech Uss <wojciech.uss@intel.com>
      
      * fixed fps results
      
      test=develop
      
      * fix top5 accuracy drop problem
      
      * updated for new QAT models
      
      * skip quantizing average pooling - dirty but working
      
      * add missing pass
      
      * added missing conv+brelu fuse pass
      
      * removed a call to non-existent pass
      
      test=develop
      
      * renamed pass
      
      test=develop
      
      * Adjust finding pooling scale to newest QAT models
      
      * Remove unnecessary code from quantization_mkldnn_pass
      
      * Copy Pooling input scale to output scale in QAT
      
      * Refactor & remove unused code in QAT
      
      * Incorporate fp32 FC into QAT
      
      test=develop
      
      * Enable graph drawing with debug flag
      
      test=develop
      
      * Add tests for QATv2
      
      * Fix paths for QATv2 models
      
      test=develop
      
      * Add option to save transformed int8 qat model
      
      test=develop
      
      * Remove redundant lines from qat mkldnn pass
      
      test=develop
      
      * Delegate disablement of avg pooling to qat
      
      test=develop
      
      * fix CI bug, test=develop
      
      * Follow Wangzhen's Review, test=develop
      
      * Update API.spec
      
      test=develop
      
      * Name False in (is_unsigned, TensorScale) tuple
      
      test=develop
      4286a627
    • A
      Removing length dims constraints of seq_pad and seq_unpad (#19497) · 99a9615a
      Aurelius84 提交于
      * Removing last dims constraints of seq_pad and seq_unpad test=develop
      
      * fix test_layer api code test=develop
      
      * fix sequence_pad_op.cc conflict test=develop
      
      * remove test_analyzer_mm_dnn test=develop
      
      * fix vectorize bug test=develop
      
      * fix vectorize<int> test=develop
      99a9615a
    • C
      polish multi process warning info (#19961) · cca26f5c
      chengduo 提交于
      test=develop
      cca26f5c
  2. 24 9月, 2019 14 次提交
  3. 23 9月, 2019 10 次提交
    • Z
    • M
      Forward recompute3 (#19913) · 9901f696
      mapingshuo 提交于
      * add recompute based checkpoints methods for large batch training
      test=develop
      
      * add append_backward_with_forward_recomputation
      test=develop
      
      * refine optimizer
      test=develop
      
      * update backward and optimizer
      test=develop
      
      * make Variable usable
      test=develop
      
      * add recompute code
      
      * refine optimizer
      test=develop
      
      * refine addup _append_backward_ops_with_checkpoints_
      1) for recompute part, just cache the grad_op_desc without appending to block
      2) before appending grad_op_desc to backward part, addup_repetitive_vars, remove unused branch
      test=develop
      
      * make method private
      
      * add recompute strategy into DistributedStrategy
      test=develop
      
      * checkpoint version3
      test=develop
      
      * remove some print information
      test=develop
      
      * remove unused sumop
      test=develop
      
      * try to fix recompute with graph building modules
      
      * add input names to vars should be held
      
      * add memory debug tool
      
      * backup backward
      
      * Fix bugs
      
      * add backward desc for op not in any segments
      
      * add exception info for sub_block
      
      test=develop
      
      * modify code style
      
      test=develop
      
      * modify code style
      
      test=develop
      
      * remove print functions
      
      test=develop
      
      * add API spec
      
      test=develop
      test=document_preview
      
      * make Recompute a child class of Optimizer
      
      test=develop
      test=document_preview
      
      * add API spec
      
      test=develop
      test=document_preview
      
      * modify API spec
      
      test=develop
      test=document_preview
      
      * add document for Recompute
      
      test=develop
      test=document_preview
      
      * change API doc of Rcompute
      
      test=develop
      test=document_preview
      
      * code cleaning
      
      test=develop
      test=document_preview
      
      * modify API spec
      
      * fix bugs when segments hold no element
      
      * add testcase for Recompute Optimizer
      
      test=develop
      test=document_preview
      
      * add test for apply_gradient, and code cleaning
      
      test=develop
      test=document_preview
      
      * add test case for load function
      
      * enable CI
      
      test=develop
      test=document
      
      * add test case
      
      test=develop
      test=document_preview
      
      * add sample code for 4 function of recompute optimizer
      
      test=develop
      test=document_preview
      9901f696
    • C
      Delete local execution scopes (#19749) · d7251a8e
      chengduo 提交于
      * Add RecordHistoryLocalExecScopes
      test=develop
      d7251a8e
    • W
      remove the useless warning for user to avoid confuse test=develop (#19871) · 5452b6a1
      wopeizl 提交于
      * remove the useless warning for user to avoid confuse test=develop
      5452b6a1
    • R
      add mse_loss (#19759) · d31c92a2
      ruri 提交于
      * add mse_loss op
      d31c92a2
    • H
      Add op compatible information (#19910) · 85b398f1
      hong 提交于
      * add op compatible infomation; test=develop
      
      * add enum type
      
      * add enum type; test=develop
      85b398f1
    • K
      fix softmax CE time limit check failed (#19846) · 3f021781
      Kaipeng Deng 提交于
      * fix softmax ce time limit check failed. test=develop
      
      * refine softmax calc. test=develop
      3f021781
    • T
      move tree_conv to fluid.contrib.layers (#19918) · a4919d36
      Tao Luo 提交于
      * move tree_conv to fluid.contrib.layers
      
      test=develop
      
      * update API.spec for tree_conv
      
      test=develop
      
      * update tree_conv api to increase unit coverage
      
      test=develop
      a4919d36
    • tensor_array_to_tensor_op.cc, test=develop (#19289) · 30adea0a
      石晓伟 提交于
      30adea0a
    • Z
      Unify DataLoader APIs (#19305) · 0436efd6
      Zeng Jinle 提交于
      * unify DataLoader APIs, test=develop
      
      * integrate iterable CPU Dataset, test=develop
      add GPU dataset supporting, test=develop
      
      * add unittests for dataset, test=develop
      
      * add more docs to dataloader apis, test=develop, test=document_preview
      
      * refine doc, test=develop
      
      * refine doc again, test=develop
      
      * increase coverage, test=develop
      0436efd6
  4. 22 9月, 2019 2 次提交
  5. 21 9月, 2019 5 次提交
    • P
    • A
      Add support for other axes in MKLDNN softmax op (#19907) · cb65439d
      Adam 提交于
      * Initial, functional commit
      
      * Clean commit related files
      test=develop
      cb65439d
    • J
      Feature/auto prune in dygraph (#19757) · 45425411
      Jiabin Yang 提交于
      * refactor dygraph,test=develop
      
      * fix failed unittest,test=develop
      
      * polish code,test=develop
      
      * check windows ci error,test=develop
      try to fix windows ci error by np.allclose,test=develop
      
      * polish vlog and profiler, test=develop
      
      * try to fix preceding ops order,test=develop
      
      * test transformer in windows ci, test=develop
      
      * use python c-api to speed up tracer.trace,test=develop
      
      * test=develop, fix docker with paddle nccl problem
      
      * test=develop, add ut for debug string and gradient_accumulator
      
      * test=develop, add tests for layer/gradient_accumulator/prepared_op
      
      * test=develop, fix complie error for test_prepared_op
      
      * test=develop, add more ut for dygraph
      
      * test=develop, create API.spec for dygraph api change
      
      * test=develop, refoctor name to make it easier to understand
      
      * test=develop, refoctor name to make it easier to understand
      
      * test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ
      
      * test=develop, fix ut failed on parallel se-resnext
      
      * test=develop, change one more PADDLE_ENFORCE
      
      * support auto prune in dygraph mode
      
      * test=develop, support auto prune
      
      * test=develop, merge develop conflict
      
      * test=develop, fix test_layer and test_tracer ut
      
      * test=develop, fix bug which may cause stop_gradient disabled with a list of backward inputs
      45425411
    • A
    • P
      Add TRT input shape check between model and runtime (#19864) · baccd7e2
      Pei Yang 提交于
      * add TRT shape check, test=develop
      
      * model_input_shape == runtime_input_shape, refine message, test=develop
      baccd7e2