1. 10 10月, 2017 17 次提交
  2. 09 10月, 2017 2 次提交
  3. 07 10月, 2017 9 次提交
  4. 06 10月, 2017 6 次提交
  5. 05 10月, 2017 6 次提交
    • N
      [FLINK-7068][blob] Introduce permanent and transient BLOB keys · 84a07a34
      Nico Kruber 提交于
      [FLINK-7068][blob] address PR review comments, part 1
      
      [FLINK-7068][blob] create a common base class for the BLOB caches
      
      [FLINK-7068][blob] update some comments
      
      [FLINK-7068][blob] integrate the BLOB type into the BlobKey
      
      [FLINK-7068][blob] rename a few methods for better consistency
      
      [FLINK-7068][blob] fix Blob*DeleteTest not working as documented in one test
      
      [FLINK-7068][blob] add checks for jobId being null in PermanentBlobCache
      
      [FLINK-7068][blob] implement get-and-delete logic for transient BLOBs
      
      Transient BLOB files are deleted on the BlobServer upon first access from a
      cache. Therefore, we do not need the DELETE operations anymore, aside from
      deleting the file from the local cache (for now).
      
      [FLINK-7068][blob] address PR comments, part 2
      
      [FLINK-7068][blob] separate permanent and transient BLOB keys
      
      * create PermanentBlobKey and TransientBlobKey (inheriting from BlobKey) and
        forbid using transient BLOBs with permanent caches and vice versa
      * make BlobKey package-private, similarly for the BlobType which is now
        reflected by the two BlobKey sub-classes
      -> this gives a cleaner interface for the user
      
      This closes #4358.
      84a07a34
    • N
      [FLINK-7261][blob] extend BlobStore#get/put with boolean return values · b57330dc
      Nico Kruber 提交于
      This way, using code can distinguish non-HA cases, i.e. VoidBlobStore, from
      HA cases, i.e. FileSystemBlobStore, in a general way and have better error
      reporting.
      b57330dc
    • N
      [FLINK-7068][blob] change BlobService sub-classes for permanent and transient BLOBs · 071e27f7
      Nico Kruber 提交于
      [FLINK-7068][blob] start introducing a new BLOB storage abstraction
      
      This is incomplete and may not compile and/or run tests successfully yet.
      
      [FLINK-7068][blob] remove BlobView from TransientBlobCache
      
      The transient BLOB cache is not supposed to work with the HA store since it only
      serves non-HA files.
      
      [FLINK-7068][blob] remove unnecessary use of BlobClient
      
      [FLINK-7068][blob] implement TransientBlobCache#put methods
      
      [FLINK-7068][blob] remove further unnecessary use of BlobClient and adapt to HA get/put methods
      
      [FLINK-7068][blob] fix BlobServer#getFileInternal not being guarded by locks
      
      [FLINK-7068][blob] add incoming file cleanup at BlobServer in cases of errors
      
      [FLINK-7068] fix missing BlobServer#putHA() jobId propagation
      
      [FLINK-7068][blob] remove BlobClient use from BlobServer{Get|Put}Test
      
      [FLINK-7068][blob] make helper methods work with any BlobService
      
      [FLINK-7068][blob] start adding a BlobCacheGetTest
      
      [FLINK-7068][blob] verify get contents in separate threads
      
      This allows (at a slight chance) that we may see an intermediate file.
      
      [FLINK-7068][blob] better locking granularity during file retrieval
      
      This allows multiple parallel downloads from the HA store to the BlobServer's
      local store although only one of these downloaded staging files will actually
      be used. In practice, this happens only during recovery and not in parallel
      anyways.
      
      [FLINK-7068][blob] share more code among BlobServer and BlobServerConnection
      
      This also applies the better locking granularity of the previous commit to
      BlobServerConnection.
      
      [FLINK-7068][blob] properly cleanup temporary staging files in all cases
      
      [FLINK-7068][blob] make PermanentBlobCache and TransientBlobCache thread-safe
      
      [FLINK-7068][tests] improve various tests
      
      [FLINK-7068][blob] change the signature of the delete calls to return success
      
      We will not throw exceptions in case of failures anymore and return whether the
      operation was successful instead. Failure details will still be accessible in
      the written logs.
      
      [FLINK-7068][tests] extend and adapt BlobServerDeleteTest
      
      [FLINK-7068][tests] adapt further BlobCache tests
      
      [FLINK-7068][tests] adapt BlobClientTest
      
      [FLINK-7068][blob] cleanup BlobClient methods
      
      BlobClient is not supposed to be used by anyone else than the
      BlobServer/BlobCache classes. Most accessors were already package-private, now
      remove the ones that just blow up the code.
      
      [FLINK-7068] add a TODO to fix the currently failing tests
      
      [FLINK-7068][tests] add a BlobCacheRecoveryTest
      
      This currently fails due to TransientBlobCache#put also storing files in HA
      store which it should not!
      
      [FLINK-7068][tests] improve failure message
      
      [FLINK-7068][blob] add permanent/transient BLOB modes to BlobClient
      
      This allows a better control of which should end up in HA store and which should
      not. Also, during GET methods, we do not check the HA store unnecessarily.
      
      [FLINK-7068][tests] extend the Blob{Server|Cache}GetTest
      
      This adds some failing GET operations and verifies that the files are cleaned
      up accordingly.
      
      [FLINK-7068][blob] remove "final" flag from BlobCache class
      
      This re-enables mocking in various unit tests.
      
      [FLINK-7068][tests] fix test relying on order of folder contents
      
      [FLINK-7068][blob] some BlobServer cleanup
      
      [FLINK-7068][hotfix] fix checkstyle errors
      
      [FLINK-7068][tests] fix tests now requiring a more complete BlobCache mock
      
      A suitable BlobCache mock should at least return a mock for a permanent and a
      transient BLOB store, so mock(BlobCache.class) is not sufficient anymore.
      
      [FLINK-7068] final wrap up
      
      * remove a left-over TODO
      * remove useless tests for the concurrency of the GET operations (we cannot test
      that the file write is guarded by a lock directly - rely on the concurrent
      checks in the individual threads instead)
      * fix some log messages
      
      [FLINK-7068][blob] remove Thread#start() call from BlobServer constructor
      
      This is bad design and limits extensibility, e.g. in tests like the
      BlobCacheRetriesTest where this caused a race condition with the sub-class.
      Instead, the user must now call BlobServer#start() explicitely.
      
      [FLINK-7068][tests] remove unused imports
      
      [FLINK-7068][tests] fix a typo
      
      [FLINK-7068][tests] add some tests that verify behaviour with corrupted files
      
      Also add corruption checks for HA-store downloads which was not implemented yet.
      
      [FLINK-7068][blob] ensure consistency in PermanentBlobCache even in cases of invalid use
      
      During cleanup, no write lock was taken but the storage directory of an
      (unused!) job was deleted. Normally, there should be no process left accessing
      its data and no new process can jump in since the registration is locked. In
      case of invalid use cases, i.e. using a job's data outside a register() and
      release() block, this could lead to strange effects.
      By guarding the cleanup with the write lock as well, we circumvent that.
      
      [FLINK-7068][hotfix] remove an unused import
      071e27f7
    • N
      [FLINK-7057][tests][hotfix] fix test instability of... · 98f6dea1
      Nico Kruber 提交于
      [FLINK-7057][tests][hotfix] fix test instability of JobManagerCleanupITCase#testBlobServerCleanupCancelledJob
      
      This test expected two messages to arrice (job cancellation and job state change
      notification) but did not take different receive orders into account. The fix:
      - removes state change listening for this test case so that only one message
        arrives, and
      - adds message comparison by object, not just class (to improve debugging)
      98f6dea1
    • N
      [FLINK-7483][blob] prevent cleanup of re-registered jobs · 40ef9082
      Nico Kruber 提交于
      When a job is registered, it may have been released before and we thus need to
      reset the cleanup timeout again.
      40ef9082
    • T
      [FLINK-7754] [rpc] Complete termination future after actor has been stopped · 4947ee66
      Till Rohrmann 提交于
      This commit waits not only until the Actor has called postStop but also until the actor
      has been completely shut down by the ActorSystem before completing the termination
      future.
      
      This closes #4770.
      4947ee66