- 09 1月, 2018 1 次提交
-
-
由 Till Rohrmann 提交于
The flip6 build profile only runs the Flip-6 related test cases. Moreover, all Flip-6 related test cases are excluded when not running the flip6 build profile. This should reduce testing time when adding more and more Flip-6 test cases. Include flink-test-utils-junit in all submodules to make the Category marker interfaces Flip6 and OldAndFlip6 available This closes #4889.
-
- 06 1月, 2018 2 次提交
-
-
由 Greg Hogan 提交于
Fix typos from the IntelliJ "Typos" inspection. This closes #5242
-
由 Stephan Ewen 提交于
-
- 28 11月, 2017 1 次提交
-
-
由 Till Rohrmann 提交于
Add "Remote connection to [null] failed with java.nio.channels.NotYetConnectedException" to the list of whitelisted log statements in YarnTestBase. This logging statement seems to appear since we moved from Flakka to Akka 2.4.0. This closes #5085.
-
- 24 11月, 2017 1 次提交
-
-
由 Nico Kruber 提交于
This is another step at using or own (off-heap) buffers for network communication that we pass through netty in order to avoid unnecessary buffer copies. This closes #4481.
-
- 22 11月, 2017 2 次提交
- 11 11月, 2017 1 次提交
-
-
由 Aljoscha Krettek 提交于
Before, we had it in places that require it. This doesn't work when running mvn javadoc:aggregate because this will only run for the root pom and can then not find the "bundle" dependencies.
-
- 09 11月, 2017 1 次提交
-
-
由 Piotr Nowojski 提交于
Disable it in most modules.
-
- 07 11月, 2017 1 次提交
-
-
由 Aljoscha Krettek 提交于
-
- 01 11月, 2017 3 次提交
-
-
由 Till 提交于
This commit changes that TaskManagerMetricGroups are now registered under the TaskManager's ResourceID instead of the InstanceID. This allows to create the TaskManagerMetricGroup at startup of the TaskManager. Moreover, it pulls the MetricRegistry out of JobManager and TaskManager. This allows to reuse the same MetricRegistry across multiple instances (e.g. in the FlinkMiniCluster case). Moreover, it ensures proper cleanup of a potentially started MetricyQueryServiceActor. Change TaskManagersHandler to work with ResourceID instead of InstanceID Adapt MetricFetcher to use ResourceID instead of InstanceID This closes #4872.
-
由 Till Rohrmann 提交于
-
由 Nico Kruber 提交于
+ fix description of `containerized.heap-cutoff-ratio` [FLINK-7400][yarn] add an integration test for yarn container memory restrictions using off-heap memory [FLINK-7400] address PR comments This closes #4506.
-
- 31 10月, 2017 1 次提交
-
-
由 zentol 提交于
This closes #4923.
-
- 14 10月, 2017 1 次提交
-
-
由 Aljoscha Krettek 提交于
We can do this now that we dropped support for Scala 2.10. This closes #4807
-
- 27 9月, 2017 2 次提交
-
-
由 Aljoscha Krettek 提交于
-
由 Aljoscha Krettek 提交于
-
- 20 9月, 2017 1 次提交
-
-
由 Till Rohrmann 提交于
This commit creates the DispatcherRestEndpoint and integrates it with the Dispatcher. The DispatcherRestEndpoint is created in the SessionClusterEntrypoint and its address is passed to the Dispatcher such that it can answer the requestRestAddress RPC. This closes #4598.
-
- 18 8月, 2017 1 次提交
-
-
由 Nico Kruber 提交于
Also change from BlobKey-based ref-counting to job-based ref-counting which is simpler and the mode we want to use from now on. Deferred cleanup (as before) is currently not implemented yet (TODO). At the BlobServer, no ref-counting will be used but the cleanup will happen when the job enters a final state (TODO). [FLINK-7057][blob] change to a cleaner API for BlobService#registerJob() [FLINK-7057][blob] implement deferred cleanup at the BlobCache Whenever a job is not referenced at the BlobCache anymore, we set a TTL and let the cleanup task remove it when this is hit and the task is run. For now, this means that a BLOB will be retained at most (2 * ConfigConstants.LIBRARY_CACHE_MANAGER_CLEANUP_INTERVAL) seconds after not being referenced anymore. We do this so that a recovery still has the chance to use existing files rather than to download them again. [FLINK-7057][blob] integrate cleanup of job-related JARs from the BlobServer TODO: an integration test that verifies that this is actually done when desired and not performed when not, e.g. if the job did not reach a final execution state [FLINK-7057][tests] extract FailingBlockingInvokable from CoordinatorShutdownTest [FLINK-7057][blob] add an integration test for the BlobServer cleanup This ensures that BLOB files are actually deleted when a job enters a final state. [FLINK-7057][tests] refrain from catching an exception just to fail the test removes code like this in the BLOB store unit tests: catch (Exception e) { e.printStackTrace(); fail(e.getMessage()); } [FLINK-7057][blob] fix BlobServer#cleanupJob() being too eager Instead of deleting the job's directory, it was deleting the parent storage directory. [FLINK-7057][blob] fix BlobServer cleanup integration * the test did not check the correct directories for cleanup * the test did not honour the test timeout [FLINK-7057][blob] test and fix BlobServer cleanup for a failed job submission [FLINK-7057][blob] rework the LibraryCacheManager API Since ref-counting has moved to the BlobCache, the BlobLibraryCacheManager is just a thin wrapper to get a user class loader by retrieving BLOBs from the BlobCache/BlobServer. Therefore, move the job-registration/-release out of it, too, and restrict its use to the task manager where the BlobCache is used (on the BlobServer, jobs do not need registration since they are only used once and will be deleted when they enter a final state). This makes the BlobServer and BlobCache instances available at the JobManager and TaskManager instances, respectively, also enabling future use cases outside of the LibraryCacheManager. [FLINK-7057][blob] address PR comments [FLINK-7057][blob] fix JobManagerLeaderElectionTest [FLINK-7057][blob] re-introduce some ref-counting for BlobLibraryCacheManager Apparently, we do need to return the same ClassLoader for different (parallel) tasks of a job running on the same task manager. Therefore, keep the initial task registration implementation that was removed with 8331fbb208d975e0c1ec990344c14315ea08dd4a and only adapt it here. This also restores some tests and adds new combinations not tested before. [FLINK-7057][blob] address PR comments [FLINK-7057][tests] fix (manual/ignored) BlobCacheCleanupTest#testJobDeferredCleanup() [FLINK-7057][hotfix] fix a checkstyle error [FLINK-7057][blob] remove the extra lock object from BlobCache We can lock on jobRefCounters instead, which is what we are guarding anyway. [FLINK-7057][blob] minor improvements to the TTL in BlobCache Do not use Long.MAX_VALUE as a code for "keep forever". Also add more comments. [FLINK-7057][blob] replace "library-cache-manager.cleanup.interval" with "blob.service.cleanup.interval" Since we moved the cleanup to the BLOB service classes, this only makes sense. [FLINK-7057][hotfix] remove an unused import [FLINK-7057][docs] adapt javadocs of JobManager descendents [FLINK-7057][blob] increase JobManagerCleanupITCase timeout The previous value of 15s seems to be too low for some runs on Travis. [FLINK-7057][blob] providing more debug output in JobManagerCleanupITCase In case the BlobServer's directory is not cleaned within the remaining time, also print which files remain. This may help debugging the situation. This closes #4238.
-
- 10 8月, 2017 1 次提交
-
-
由 zjureel 提交于
This closes #4075.
-
- 09 8月, 2017 1 次提交
-
-
由 Till Rohrmann 提交于
The Flip-6 Yarn session cluster can now be started with yarn-session.sh --flip6. Per default, the old Yarn application master will be started. This closes #4465.
-
- 07 8月, 2017 1 次提交
-
-
由 zentol 提交于
This closes #4453.
-
- 31 7月, 2017 1 次提交
-
-
由 Till Rohrmann 提交于
Upload user code jar from JobGraph This closes #4284.
-
- 28 7月, 2017 1 次提交
-
-
由 Till Rohrmann 提交于
Instead the AbstractYarnClusterDescriptor is passed in a Configuration instance which is sent to the started application master. Pass in configuration directory manually Remove configurationDirectory resolution from AbstractYarnClusterDescriptor Address PR comments This closes #4280.
-
- 27 7月, 2017 1 次提交
-
-
由 Till Rohrmann 提交于
The deploySession method now is given a ClusterSpecification which specifies the size of the cluster which it is supposed to deploy. Remove 2 line breaks, unnecessary parameters for YarnTestBase#Runner, add builder for ClusterSpecification This closes #4271.
-
- 26 7月, 2017 1 次提交
-
-
由 Till Rohrmann 提交于
Rename deploySession to deploySessionCluster, deployJob to deployJobCluster; Add ClusterDeploymentDescription to deployJobCluster method This closes #4270.
-
- 13 7月, 2017 1 次提交
-
-
由 zentol 提交于
-
- 10 7月, 2017 1 次提交
-
-
由 zjureel 提交于
This closes #4278.
-
- 07 7月, 2017 1 次提交
-
-
由 Greg Hogan 提交于
Add dependencies for batch and streaming WordCount programs and copies the jar files into a new target/programs directory. The integration tests now directly references the program jar files rather than the prior brittle search. This removes the flink-yarn-tests build-time dependency on the examples modules (there remains a build-time dependency on flink-dist). This closes #4264
-
- 02 6月, 2017 1 次提交
-
-
由 Greg Hogan 提交于
Update Hadoop versions and replace Hadoop 2.3 with 2.8 in build and continuous integration scripts. flink-yarn-tests can now be enabled for all supported Hadoop versions and is made non-optional. This closes #3832
-
- 29 5月, 2017 1 次提交
-
-
由 zentol 提交于
This closes #4005.
-
- 26 5月, 2017 1 次提交
-
-
由 Till Rohrmann 提交于
Before the YarnClusterClient decided when to delete the Yarn application files. This is problematic because the client does not know whether a Yarn application is being restarted or terminated. Due to this the files where always deleted. This prevents Yarn from restarting a failed ApplicationMaster, effectively thwarting Flink's HA capabilities. The PR changes the behaviour such that the YarnJobManager deletes the Yarn files if it receives a StopCluster message. That way, we can be sure that the yarn files are deleted only iff the cluster is intended to be shut down.
-
- 25 5月, 2017 1 次提交
-
-
由 zentol 提交于
This closes #3985.
-
- 17 5月, 2017 2 次提交
-
-
由 Till Rohrmann 提交于
The HighAvailabilityService creates a single BlobStoreService instance which is shared by all BlobServer and BlobCache instances. The BlobStoreService's lifecycle is exclusively managed by the HighAvailabilityServices. This means that the BlobStore's content is only cleaned up if the HighAvailabilityService's HA data is cleaned up. Having this single point of control, makes it easier to decide when to discard HA data (e.g. in case of a successful job execution) and when to retain the data (e.g. for recovery). Close and cleanup all data of BlobStore in HighAvailabilityServices Use HighAvailabilityServices to create BlobStore Introduce BlobStoreService interface to hide close and closeAndCleanupAllData methods This closes #3864.
-
由 Till Rohrmann 提交于
The YARN cli will now split the dynamic propertie at the first occurrence of the = sign instead of splitting it at every = sign. That way we support dynamic properties of the form -yDenv.java.opts="-DappName=foobar". Address PR comments This closes #3903.
-
- 11 5月, 2017 1 次提交
-
-
由 Greg Hogan 提交于
Use scala.binary.version as defined in the parent POM and remove the script to swap scala version identifiers. This closes #3800
-
- 08 5月, 2017 1 次提交
-
-
由 Robert Metzger 提交于
-
- 05 5月, 2017 2 次提交
-
-
由 Till Rohrmann 提交于
Remove client less factory methods from ZooKeeperUtils Introduce default job id This closes #3781.
-
由 Till Rohrmann 提交于
This PR introduces a standalone high availability services implementation which can be used in a distributed setting with no HA guarantees. Additionally, it introduces a common base class which is also used by the EmbeddedHaServices. This base class instantiates the standalone variants of the checkpoint recovery factory, submitted job graphs store, running jobs registry and blob store. The StandaloneHaServices are instantiated with a fixed address for the Job- and ResourceManager. This address and the HighAvailability.DEFAULT_LEADER_ID is returned by the corresponding LeaderRetrievalServices when being started. This closes #3622.
-
- 21 2月, 2017 1 次提交
-
-
由 Stephan Ewen 提交于
The CoreOptions should hold all essential configuration values that are not specific to JobManager, TaskManager or any feature area, like HighAvailability or Security. Examples for that are - default java options - default parallelism - default state backend
-