1. 09 1月, 2018 1 次提交
    • T
      [FLINK-7903] [tests] Add flip6 build profile · 2d97cc18
      Till Rohrmann 提交于
      The flip6 build profile only runs the Flip-6 related test cases. Moreover,
      all Flip-6 related test cases are excluded when not running the flip6 build
      profile. This should reduce testing time when adding more and more Flip-6
      test cases.
      
      Include flink-test-utils-junit in all submodules to make the Category marker interfaces Flip6 and OldAndFlip6 available
      
      This closes #4889.
      2d97cc18
  2. 06 1月, 2018 2 次提交
  3. 28 11月, 2017 1 次提交
  4. 24 11月, 2017 1 次提交
  5. 22 11月, 2017 2 次提交
  6. 11 11月, 2017 1 次提交
  7. 09 11月, 2017 1 次提交
  8. 07 11月, 2017 1 次提交
  9. 01 11月, 2017 3 次提交
    • T
      [FLINK-7876] Register TaskManagerMetricGroup under ResourceID · d45b9412
      Till 提交于
      This commit changes that TaskManagerMetricGroups are now registered under the
      TaskManager's ResourceID instead of the InstanceID. This allows to create the
      TaskManagerMetricGroup at startup of the TaskManager.
      
      Moreover, it pulls the MetricRegistry out of JobManager and TaskManager. This
      allows to reuse the same MetricRegistry across multiple instances (e.g. in the
      FlinkMiniCluster case). Moreover, it ensures proper cleanup of a potentially
      started MetricyQueryServiceActor.
      
      Change TaskManagersHandler to work with ResourceID instead of InstanceID
      
      Adapt MetricFetcher to use ResourceID instead of InstanceID
      
      This closes #4872.
      d45b9412
    • T
      358aacda
    • N
      [FLINK-7400][cluster] fix cut-off memory not used for off-heap reserve as intended · 0df8e079
      Nico Kruber 提交于
      + fix description of `containerized.heap-cutoff-ratio`
      
      [FLINK-7400][yarn] add an integration test for yarn container memory restrictions using off-heap memory
      
      [FLINK-7400] address PR comments
      
      This closes #4506.
      0df8e079
  10. 31 10月, 2017 1 次提交
  11. 14 10月, 2017 1 次提交
  12. 27 9月, 2017 2 次提交
  13. 20 9月, 2017 1 次提交
  14. 18 8月, 2017 1 次提交
    • N
      [FLINK-7057][blob] move ref-counting from the LibraryCacheManager to the BlobCache · 7b236240
      Nico Kruber 提交于
      Also change from BlobKey-based ref-counting to job-based ref-counting which is
      simpler and the mode we want to use from now on. Deferred cleanup (as before)
      is currently not implemented yet (TODO).
      At the BlobServer, no ref-counting will be used but the cleanup will happen
      when the job enters a final state (TODO).
      
      [FLINK-7057][blob] change to a cleaner API for BlobService#registerJob()
      
      [FLINK-7057][blob] implement deferred cleanup at the BlobCache
      
      Whenever a job is not referenced at the BlobCache anymore, we set a TTL and let
      the cleanup task remove it when this is hit and the task is run. For now, this
      means that a BLOB will be retained at most
      (2 * ConfigConstants.LIBRARY_CACHE_MANAGER_CLEANUP_INTERVAL) seconds after not
      being referenced anymore. We do this so that a recovery still has the chance to
      use existing files rather than to download them again.
      
      [FLINK-7057][blob] integrate cleanup of job-related JARs from the BlobServer
      
      TODO: an integration test that verifies that this is actually done when desired
      and not performed when not, e.g. if the job did not reach a final execution
      state
      
      [FLINK-7057][tests] extract FailingBlockingInvokable from CoordinatorShutdownTest
      
      [FLINK-7057][blob] add an integration test for the BlobServer cleanup
      
      This ensures that BLOB files are actually deleted when a job enters a final
      state.
      
      [FLINK-7057][tests] refrain from catching an exception just to fail the test
      
      removes code like this in the BLOB store unit tests:
      
      catch (Exception e) {
          e.printStackTrace();
          fail(e.getMessage());
      }
      
      [FLINK-7057][blob] fix BlobServer#cleanupJob() being too eager
      
      Instead of deleting the job's directory, it was deleting the parent storage
      directory.
      
      [FLINK-7057][blob] fix BlobServer cleanup integration
      
      * the test did not check the correct directories for cleanup
      * the test did not honour the test timeout
      
      [FLINK-7057][blob] test and fix BlobServer cleanup for a failed job submission
      
      [FLINK-7057][blob] rework the LibraryCacheManager API
      
      Since ref-counting has moved to the BlobCache, the BlobLibraryCacheManager is
      just a thin wrapper to get a user class loader by retrieving BLOBs from the
      BlobCache/BlobServer. Therefore, move the job-registration/-release out of it,
      too, and restrict its use to the task manager where the BlobCache is used (on
      the BlobServer, jobs do not need registration since they are only used once and
      will be deleted when they enter a final state).
      
      This makes the BlobServer and BlobCache instances available at the JobManager
      and TaskManager instances, respectively, also enabling future use cases outside
      of the LibraryCacheManager.
      
      [FLINK-7057][blob] address PR comments
      
      [FLINK-7057][blob] fix JobManagerLeaderElectionTest
      
      [FLINK-7057][blob] re-introduce some ref-counting for BlobLibraryCacheManager
      
      Apparently, we do need to return the same ClassLoader for different (parallel)
      tasks of a job running on the same task manager. Therefore, keep the initial
      task registration implementation that was removed with
      8331fbb208d975e0c1ec990344c14315ea08dd4a and only adapt it here. This also
      restores some tests and adds new combinations not tested before.
      
      [FLINK-7057][blob] address PR comments
      
      [FLINK-7057][tests] fix (manual/ignored) BlobCacheCleanupTest#testJobDeferredCleanup()
      
      [FLINK-7057][hotfix] fix a checkstyle error
      
      [FLINK-7057][blob] remove the extra lock object from BlobCache
      
      We can lock on jobRefCounters instead, which is what we are guarding anyway.
      
      [FLINK-7057][blob] minor improvements to the TTL in BlobCache
      
      Do not use Long.MAX_VALUE as a code for "keep forever". Also add more comments.
      
      [FLINK-7057][blob] replace "library-cache-manager.cleanup.interval" with "blob.service.cleanup.interval"
      
      Since we moved the cleanup to the BLOB service classes, this only makes sense.
      
      [FLINK-7057][hotfix] remove an unused import
      
      [FLINK-7057][docs] adapt javadocs of JobManager descendents
      
      [FLINK-7057][blob] increase JobManagerCleanupITCase timeout
      
      The previous value of 15s seems to be too low for some runs on Travis.
      
      [FLINK-7057][blob] providing more debug output in JobManagerCleanupITCase
      
      In case the BlobServer's directory is not cleaned within the remaining time,
      also print which files remain. This may help debugging the situation.
      
      This closes #4238.
      7b236240
  15. 10 8月, 2017 1 次提交
  16. 09 8月, 2017 1 次提交
  17. 07 8月, 2017 1 次提交
  18. 31 7月, 2017 1 次提交
  19. 28 7月, 2017 1 次提交
  20. 27 7月, 2017 1 次提交
    • T
      [FLINK-7113] Make ClusterDescriptor independent of cluster size · 7cf997d1
      Till Rohrmann 提交于
      The deploySession method now is given a ClusterSpecification which specifies the
      size of the cluster which it is supposed to deploy.
      
      Remove 2 line breaks, unnecessary parameters for YarnTestBase#Runner, add builder for ClusterSpecification
      
      This closes #4271.
      7cf997d1
  21. 26 7月, 2017 1 次提交
  22. 13 7月, 2017 1 次提交
  23. 10 7月, 2017 1 次提交
  24. 07 7月, 2017 1 次提交
    • G
      [FLINK-7042] [yarn] Fix jar file discovery flink-yarn-tests · 709f23e7
      Greg Hogan 提交于
      Add dependencies for batch and streaming WordCount programs and copies
      the jar files into a new target/programs directory. The integration
      tests now directly references the program jar files rather than the
      prior brittle search.
      
      This removes the flink-yarn-tests build-time dependency on the examples
      modules (there remains a build-time dependency on flink-dist).
      
      This closes #4264
      709f23e7
  25. 02 6月, 2017 1 次提交
  26. 29 5月, 2017 1 次提交
  27. 26 5月, 2017 1 次提交
    • T
      [FLINK-6646] [yarn] Let YarnJobManager delete Yarn application files · 6429e593
      Till Rohrmann 提交于
      Before the YarnClusterClient decided when to delete the Yarn application files.
      This is problematic because the client does not know whether a Yarn application
      is being restarted or terminated. Due to this the files where always deleted. This
      prevents Yarn from restarting a failed ApplicationMaster, effectively thwarting
      Flink's HA capabilities.
      
      The PR changes the behaviour such that the YarnJobManager deletes the Yarn files
      if it receives a StopCluster message. That way, we can be sure that the yarn files
      are deleted only iff the cluster is intended to be shut down.
      6429e593
  28. 25 5月, 2017 1 次提交
  29. 17 5月, 2017 2 次提交
    • T
      [FLINK-6519] Integrate BlobStore in lifecycle management of HighAvailabilityServices · 88b0f2ac
      Till Rohrmann 提交于
      The HighAvailabilityService creates a single BlobStoreService instance which is
      shared by all BlobServer and BlobCache instances. The BlobStoreService's lifecycle
      is exclusively managed by the HighAvailabilityServices. This means that the
      BlobStore's content is only cleaned up if the HighAvailabilityService's HA data
      is cleaned up. Having this single point of control, makes it easier to decide when
      to discard HA data (e.g. in case of a successful job execution) and when to retain
      the data (e.g. for recovery).
      
      Close and cleanup all data of BlobStore in HighAvailabilityServices
      
      Use HighAvailabilityServices to create BlobStore
      
      Introduce BlobStoreService interface to hide close and closeAndCleanupAllData methods
      
      This closes #3864.
      88b0f2ac
    • T
      [FLINK-6581] [cli] Correct dynamic property parsing for YARN cli · dceb5cc1
      Till Rohrmann 提交于
      The YARN cli will now split the dynamic propertie at the first occurrence of
      the = sign instead of splitting it at every = sign. That way we support dynamic
      properties of the form -yDenv.java.opts="-DappName=foobar".
      
      Address PR comments
      
      This closes #3903.
      dceb5cc1
  30. 11 5月, 2017 1 次提交
  31. 08 5月, 2017 1 次提交
  32. 05 5月, 2017 2 次提交
    • T
      [FLINK-6078] Remove CuratorFramework#close calls from ZooKeeper based HA services · ddd6a99a
      Till Rohrmann 提交于
      Remove client less factory methods from ZooKeeperUtils
      
      Introduce default job id
      
      This closes #3781.
      ddd6a99a
    • T
      [FLINK-6136] Separate EmbeddedHaServices and StandaloneHaServices · a0bb99c7
      Till Rohrmann 提交于
      This PR introduces a standalone high availability services implementation which can be used
      in a distributed setting with no HA guarantees. Additionally, it introduces a common base
      class which is also used by the EmbeddedHaServices. This base class instantiates the
      standalone variants of the checkpoint recovery factory, submitted job graphs store, running
      jobs registry and blob store.
      
      The StandaloneHaServices are instantiated with a fixed address for the Job- and
      ResourceManager. This address and the HighAvailability.DEFAULT_LEADER_ID is returned by
      the corresponding LeaderRetrievalServices when being started.
      
      This closes #3622.
      a0bb99c7
  33. 21 2月, 2017 1 次提交
    • S
      [FLINK-4770] [core] Introduce 'CoreOptions' · a4047965
      Stephan Ewen 提交于
      The CoreOptions should hold all essential configuration values that are not specific to
      JobManager, TaskManager or any feature area, like HighAvailability or Security.
      
      Examples for that are
        - default java options
        - default parallelism
        - default state backend
      a4047965