- 14 12月, 2017 18 次提交
-
-
由 shuai.xus 提交于
Summary: Now, flink only support user define CPU and MEM, but some user need to specify the GPU, FPGA and so on resources. So it need to make the resouce type extendible in the ResourceSpec. Add a extend field for new resources. Test Plan: UnitTest Reviewers: haitao.w Differential Revision: https://aone.alibaba-inc.com/code/D327427 make Resource abstract and add GPUResource FPGAResource This closes #4911. Add a resource spec builder and remove FPGAResource
-
由 Till Rohrmann 提交于
-
由 Till Rohrmann 提交于
This commit adds support for queued scheduling with slot sharing to the SlotPool. The idea of slot sharing is that multiple tasks can run in the same slot. Moreover, queued scheduling means that a slot request must not be completed right away but at a later point in time. This allows to start new TaskExecutors in case that there are no more slots left. The main component responsible for the management of shared slots is the SlotSharingManager. The SlotSharingManager maintains internally a tree-like structure which stores the SlotContext future of the underlying AllocatedSlot. Whenever this future is completed potentially pending LogicalSlot instantiations are executed and sent to the slot requester. A shared slot is represented by a MultiTaskSlot which can harbour multiple TaskSlots. A TaskSlot can either be a MultiTaskSlot or a SingleTaskSlot. In order to represent co-location constraints, we first obtain a root MultiTaskSlot and then allocate a nested MultiTaskSlot in which the co-located tasks are allocated. The corresponding SlotRequestID is assigned to the CoLocationConstraint in order to make the TaskSlot retrievable for other tasks assigned to the same CoLocationConstraint. Port SchedulerSlotSharingTest, SchedulerIsolatedTasksTest and ScheduleWithCoLocationHintTest to run with SlotPool. Restructure SlotPool components. Add SlotSharingManagerTest, SlotPoolSlotSharingTest and SlotPoolCoLocationTest. This closes #5091.
-
由 Till Rohrmann 提交于
-
由 Till Rohrmann 提交于
Not only check for a slot request with the right allocation id but also check whether we can fulfill other pending slot requests with an unclaimed offered slot before adding it to the list of available slots. This closes #5090.
-
由 Till Rohrmann 提交于
Before logical slots like the SimpleSlot and SharedSlot where associated to the actually allocated slot via the AllocationID. This, however, was sub-optimal because allocated slots can be re-used to fulfill also other slot requests (logical slots). Therefore, we should bind the logical slots to the right id with the right lifecycle which is the slot request id. This closes #5089.
-
由 Till Rohrmann 提交于
This commit introduces the SlotContext which is an abstraction for the SimpleSlot to obtain the relevant slot information to do the communication with the TaskManager without relying on the AllocatedSlot which is now only used by the SlotPool. This closes #5088.
-
由 Till Rohrmann 提交于
The cluster entrypoints start the ResourceManager with the web interface URL. This URL is used to set the correct tracking URL in Yarn when registering the Yarn application. This closes #5128.
-
由 Till Rohrmann 提交于
[FLINK-8262] [tests] Harden IndividualRestartsConcurrencyTest.testLocalFailureFailsPendingCheckpoints The problem was a concurrent restart attempt which failed due to not enough available slots. This failure would lead to the job failure and the discarding of all pending checkpoints.
-
由 Stephan Ewen 提交于
-
由 Stephan Ewen 提交于
-
由 Stephan Ewen 提交于
-
由 Stephan Ewen 提交于
-
由 Till Rohrmann 提交于
Remove isCanceled, isReleased method and decouple logical slot from Execution by introducing a Payload interface which is set for a LogicalSlot. The Payload interface is implemented by the Execution and allows to fail an implementation and obtaining a termination future. Introduce proper Execution#releaseFuture which is completed once the Execution's assigned resource has been released. This closes #5087.
-
由 Till Rohrmann 提交于
The LogicalSlot interface decouples the task deployment from the actual slot implementation which at the moment is Slot, SimpleSlot and SharedSlot. This is a helpful step to introduce a different slot implementation for Flip-6. This closes #5086.
-
由 Till Rohrmann 提交于
This closes #4988.
-
由 Till Rohrmann 提交于
The WebMonitorEndpoint is the common rest endpoint used for serving the web frontend REST calls. It will be used by the Dispatcher and the JobMaster to fuel the web frontend. This closes #4987.
-
由 zentol 提交于
-
- 13 12月, 2017 14 次提交
-
-
由 gyao 提交于
Implement SubmittedJobGraphListener interface in Dispatcher Call start() on SubmittedJobGraphStore with Dispatcher as listener. To enable this, the dispatcher must implement the SubmittedJobGraphListener interface. Add simple unit tests for the new methods. Refactor DispatcherTest to remove redundancy. [FLINK-8176][flip6] Make InMemorySubmittedJobGraphStore thread-safe [FLINK-8176][flip6] Add method isStarted() to TestingLeaderElectionService [FLINK-8176][flip6] Return same RunningJobsRegistry instance from TestingHighAvailabilityServices [FLINK-8176][flip6] Fix race conditions in Dispatcher and DispatcherTest Check if jobManagerRunner exists before submitting job. Replace JobManagerRunner mock used in tests with real instance. Do not run job graph recovery in actor main thread when job graph is recovered from SubmittedJobGraphListener#onAddedJobGraph(JobID). [FLINK-8176][flip6] Rename variables in DispatcherTest [FLINK-8176][flip6] Remove injectMocks in DispatcherTest [FLINK-8176][flip6] Update Dispatcher's SubmittedJobGraphListener callbacks Always attempt the job submission if onAddedJobGraph or onRemovedJobGraph are called. The checks in submitJob and removeJob are sufficient. This closes #5107.
-
由 gyao 提交于
-
由 gyao 提交于
-
由 Bowen Li 提交于
This closes #5129.
-
由 Joerg Schad 提交于
This closes #5157.
-
由 zentol 提交于
This closes #5153.
-
由 zentol 提交于
-
由 zentol 提交于
-
由 zentol 提交于
This closes #8213.
-
由 zentol 提交于
This closes #5099.
-
由 Nico Kruber 提交于
-
由 Nico Kruber 提交于
This closes #5127.
-
由 Nico Kruber 提交于
Previously, the ResultPartitionWriter implemented the EventListener interface and was used for event registration, although event publishing was already handled via the TaskEventDispatcher. Now, we use the TaskEventDispatcher for both, event registration and publishing. It also adds the TaskEventDispatcher to the Environment information for a task to be able to work with it (only IterationHeadTask so far). This closes #4761.
-
由 Nico Kruber 提交于
This closes #5147.
-
- 12 12月, 2017 4 次提交
-
-
由 zentol 提交于
This closes #5146.
-
由 Tony Wei 提交于
This closes #5115.
-
由 Nico Kruber 提交于
This closes #5064.
-
由 Raycee 提交于
This closes #5152.
-
- 08 12月, 2017 1 次提交
-
-
由 zentol 提交于
-
- 07 12月, 2017 1 次提交
-
-
由 zentol 提交于
This closes #5052.
-
- 06 12月, 2017 2 次提交