1. 20 5月, 2020 1 次提交
    • H
      Fix test_cond flaky test under Windows (#24633) · 00c8ee18
      Huihuang Zheng 提交于
      In the past, the test_cond will fail with 2% probability and easy to re-produce.
      
      Now I re-run 300 times and no failure occurs. The probability of still has the failure is (1 - 2%) ^ 300 ~= 0.00004. We can say the random failure disappears. Maybe someone fixed some bugs in PE.
        
      00c8ee18
  2. 12 4月, 2020 1 次提交
    • H
      Error Message Enhancement (#23483) · 1d3b0134
      Huihuang Zheng 提交于
      This PR enhances error messages of several API/OPs:
      
      ParallelExecutor (python && C++)
      Executor (python && C++)
      StaticRNN (python)
      IfElse (python)
      cond (python)
      split_lod_tensor (python && C++)
      1d3b0134
  3. 11 4月, 2020 1 次提交
    • H
      Temporary Disable Flaky test_cond Under Windows (#23424) · 4c57e395
      Huihuang Zheng 提交于
      The flaky windows test is hard to debug. It just has an exit code 0xc0000374 without any log so we don't know where and why. The probability of failure is about 1/50.
      
      I spent 3 days and found it happened only when using PE + control flow + Windows. Exit code 0xc0000374 indicates heap corruption or access violation, but I found the memory is enough during debugging. There is no failed test under 500+ linux tests. I suspect the reason is multiple thread difference between Windows and Linux but I don't have time to completely debug it now. I will temporary disable the test and fix it in next days.
      4c57e395
  4. 26 2月, 2020 1 次提交
    • S
      support control flow cond in dygraph mode (#22693) · b813c948
      songyouwei 提交于
      * dygraph support cond op
      test=develop
      
      * unittest coverage
      test=develop
      
      * fix coverage
      test=develop
      
      * fix for coverage
      test=develop
      
      * refine TypeError msg
      test=develop
      
      * remove restrict
      test=develop
      b813c948
  5. 06 1月, 2020 1 次提交
  6. 18 12月, 2019 1 次提交
  7. 06 12月, 2019 1 次提交
    • H
      Add Much Complex Test and Fix Bugs for Control Flow cond API (#21532) · 1dcf6a72
      Huihuang Zheng 提交于
      Add tests to use dy/dx to make sure the gradient values calculated by the control flow backward is correct. Also fixed bugs detected by those tests.
      
      Fix bugs:
      
      1. Unlike sum_op, optimizer ops don't allow uninitialized input tensor. But in conditional_block_grad_op, since the conditional_block may not run, the output gradient tensor may be uninitialized, which will cause the optimizer op error. To fix it, we should let optimizer ops support uninitialized input like sum_op or assign the uninitialized gradient to 0 when the conditional_block_grad_op doesn't run. I found there are about 10+ optimizer ops. **To be simpler, I just assign output gradient of the conditional_block_grad_op to 0 in this PR**. But it can be further explored whether we can make optimizer ops like sum_op to support uninitialized input tensor because theoretically we can speed up without the assigning in conditional_block_grad_op.
      
      2. Infer parameter shapes during append_backward. I didn't know that all our parameters are in global block. When op_desc is inferring shapes at the sub-block, it may not know the shape of gradients of parameters whose shape information is at global block. I fixed it by inferring shapes of gradients from forward var.
      
      This PR also did some code clean up:
      1. Print the var name when sgd_op catches shape error so that it is easier to debug
      2. Fix a typo: dicta -> dict
      1dcf6a72
  8. 29 11月, 2019 1 次提交
    • H
      Fix Cond Bug for Nested Control Flow (#21340) · 630be319
      Huihuang Zheng 提交于
      * Commit before merging develop
      
      test=develop
      
      * Backup after working with Huihuang logs
      
      * Commit before deleting Huihuang debug loggings
      
      * Commit before debug
      
      test=develop
      
      * Fix bug commit
      
      test=develop
      
      * Backup of fixing bugs
      
      test=develop
      
      * Clean up code
      
      test=develop
      
      * Fix a bug in sum_op
      
      test=develop
      630be319
  9. 11 11月, 2019 1 次提交