1. 18 5月, 2020 3 次提交
  2. 17 5月, 2020 1 次提交
  3. 16 5月, 2020 4 次提交
  4. 15 5月, 2020 1 次提交
  5. 14 5月, 2020 1 次提交
  6. 13 5月, 2020 4 次提交
  7. 12 5月, 2020 1 次提交
    • Z
      [FLINK-16536][network][checkpointing] Implement InputChannel state recovery... · d7525baf
      Zhijiang 提交于
      [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint
      
      During recovery process for unaligned checkpoint, the input channel state should also be recovered besides with existing operator states.
      
      We considered three guarantees during the implementation:
      1. Make input recovery happen after the output recovery for providing more floating buffers on output side firstly.
      2. Make partition request happen after input recovery for avoiding new data overtaking the previous state data.
      3. Introduce a dedicated single IO executor for unspilling the channel state one by one, to avoid potential random IO.
      
      This closes #11687.
      d7525baf
  8. 01 5月, 2020 1 次提交
  9. 27 4月, 2020 3 次提交
    • A
    • X
    • K
      [FLINK-16404][runtime] Avoid caching buffers for blocked input channels before barrier alignment · 2e313f02
      kevin.cyj 提交于
      This commit is the first part of implementation to solve the dead lock problem when reducing the exclusive buffer of receiver side to 0.
      
      Reducing the number of exclusive buffers of receiver side to 0 can bring several advantages (may at the cost of some performance regression). One is that memory can be saved from the reduced network buffer usage. Another important benefit is that the in-flight data can be reduced so we can speed up checkpoint in cases of back pressure. However, for the current implementation, reducing the exclusive buffer of receiver side can incur deadlock problem because all the floating buffers might be requested away by some blocked input channels and never recycled until barrier alignment.
      
      To solve the problem, this commit mainly makes the following changes:
      1. At sender side, after sending a checkpoint barrier when aligned exactly-once checkpoint mode is used, the outgoing channel will be blocked and no data will be sent out until the channel is unblocked.
      2. At receiver side, no buffer will be stored in BufferStorage any more and after a checkpoint is completed or canceled, the receiver side will resume data consumption and unblock the upstream by sending a special event to the sender side.
      
      Note that after this patch we still can't set the exclusive buffer of receiver side to 0 because there is still deadlock problem which will be totally solved in the following up patches.
      2e313f02
  10. 22 4月, 2020 1 次提交
  11. 17 4月, 2020 2 次提交
  12. 12 4月, 2020 1 次提交
  13. 06 4月, 2020 1 次提交
    • D
      [FLINK-16913] Migrate StateBackends to use ReadableConfig instead of Configuration · bb46756b
      Dawid Wysakowicz 提交于
      StateBackendFactories do not need a full read and write access to the
      Configuration object. It's sufficient to have read only access. Moreover
      the ReadableConfig is a lightweight interface that can be implemented in
      other ways, not just through the Configuration. Lastly we exposed this
      lightweight interface as a configuration entry point for
      ExecutionEnvironments. This change will make it possible to pass the
      ReadableConfig directly to the StateBackendFactories without fragile
      adapters.
      bb46756b
  14. 03 4月, 2020 1 次提交
    • M
      [FLINK-16705] Ensure MiniCluster shutdown does not interfere with JobResult retrieval · db81417b
      Maximilian Michels 提交于
      There is a race condition in `LocalExecutor` between (a) shutting down the
      cluster when the job has finished and (b) the client which retrieves the result
      of the job execution.
      
      This was observed in Beam, running a large test suite with the Flink Runner.
      
      We should make sure the job result retrieval and the cluster shutdown do not
      interfere. This adds a PerJobMiniClusterClient which guarantees that.
      
      Improve message for running flag state checks in MiniCluster
      
      Additionally check for the JobID in PerJobMiniClusterClient
      
      Introduce PerJobMiniCluster and a corresponding JobClient
      
      Add TestLogger to test
      
      Convert shutdown methods to be async
      
      This closes #11473.
      db81417b
  15. 02 4月, 2020 2 次提交
  16. 31 3月, 2020 1 次提交
  17. 28 3月, 2020 1 次提交
  18. 26 3月, 2020 3 次提交
  19. 25 3月, 2020 4 次提交
  20. 24 3月, 2020 3 次提交
  21. 23 3月, 2020 1 次提交