Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
doujutun3207
flink
提交
5a86a0a1
F
flink
项目概览
doujutun3207
/
flink
与 Fork 源项目一致
从无法访问的项目Fork
通知
24
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
F
flink
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
5a86a0a1
编写于
11月 17, 2015
作者:
U
Ufuk Celebi
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
[FLINK-3011] [runtime] Fix cancel during restart
上级
ceabbd07
变更
2
显示空白变更内容
内联
并排
Showing
2 changed file
with
169 addition
and
1 deletion
+169
-1
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java
...g/apache/flink/runtime/executiongraph/ExecutionGraph.java
+28
-1
flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/ExecutionGraphRestartTest.java
...ink/runtime/executiongraph/ExecutionGraphRestartTest.java
+141
-0
未找到文件。
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java
浏览文件 @
5a86a0a1
...
@@ -711,6 +711,26 @@ public class ExecutionGraph implements Serializable {
...
@@ -711,6 +711,26 @@ public class ExecutionGraph implements Serializable {
return
;
return
;
}
}
}
}
// Executions are being canceled. Go into cancelling and wait for
// all vertices to be in their final state.
else
if
(
current
==
JobStatus
.
FAILING
)
{
if
(
transitionState
(
current
,
JobStatus
.
CANCELLING
))
{
return
;
}
}
// All vertices have been cancelled and it's safe to directly go
// into the canceled state.
else
if
(
current
==
JobStatus
.
RESTARTING
)
{
synchronized
(
progressLock
)
{
if
(
transitionState
(
current
,
JobStatus
.
CANCELED
))
{
postRunCleanup
();
progressLock
.
notifyAll
();
LOG
.
info
(
"Canceled during restart."
);
return
;
}
}
}
else
{
else
{
// no need to treat other states
// no need to treat other states
return
;
return
;
...
@@ -747,9 +767,16 @@ public class ExecutionGraph implements Serializable {
...
@@ -747,9 +767,16 @@ public class ExecutionGraph implements Serializable {
public
void
restart
()
{
public
void
restart
()
{
try
{
try
{
synchronized
(
progressLock
)
{
synchronized
(
progressLock
)
{
if
(
state
!=
JobStatus
.
RESTARTING
)
{
JobStatus
current
=
state
;
if
(
current
==
JobStatus
.
CANCELED
)
{
LOG
.
info
(
"Canceled job during restart. Aborting restart."
);
return
;
}
else
if
(
current
!=
JobStatus
.
RESTARTING
)
{
throw
new
IllegalStateException
(
"Can only restart job from state restarting."
);
throw
new
IllegalStateException
(
"Can only restart job from state restarting."
);
}
}
if
(
scheduler
==
null
)
{
if
(
scheduler
==
null
)
{
throw
new
IllegalStateException
(
"The execution graph has not been scheduled before - scheduler is null."
);
throw
new
IllegalStateException
(
"The execution graph has not been scheduled before - scheduler is null."
);
}
}
...
...
flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/ExecutionGraphRestartTest.java
浏览文件 @
5a86a0a1
...
@@ -21,6 +21,7 @@ package org.apache.flink.runtime.executiongraph;
...
@@ -21,6 +21,7 @@ package org.apache.flink.runtime.executiongraph;
import
org.apache.flink.api.common.JobID
;
import
org.apache.flink.api.common.JobID
;
import
org.apache.flink.configuration.Configuration
;
import
org.apache.flink.configuration.Configuration
;
import
org.apache.flink.runtime.akka.AkkaUtils
;
import
org.apache.flink.runtime.akka.AkkaUtils
;
import
org.apache.flink.runtime.execution.ExecutionState
;
import
org.apache.flink.runtime.instance.Instance
;
import
org.apache.flink.runtime.instance.Instance
;
import
org.apache.flink.runtime.jobgraph.JobGraph
;
import
org.apache.flink.runtime.jobgraph.JobGraph
;
import
org.apache.flink.runtime.jobgraph.JobStatus
;
import
org.apache.flink.runtime.jobgraph.JobStatus
;
...
@@ -37,6 +38,9 @@ import java.util.concurrent.TimeUnit;
...
@@ -37,6 +38,9 @@ import java.util.concurrent.TimeUnit;
import
static
org
.
apache
.
flink
.
runtime
.
executiongraph
.
ExecutionGraphTestUtils
.
SimpleActorGateway
;
import
static
org
.
apache
.
flink
.
runtime
.
executiongraph
.
ExecutionGraphTestUtils
.
SimpleActorGateway
;
import
static
org
.
junit
.
Assert
.
assertEquals
;
import
static
org
.
junit
.
Assert
.
assertEquals
;
import
static
org
.
junit
.
Assert
.
fail
;
import
static
org
.
junit
.
Assert
.
fail
;
import
static
org
.
mockito
.
Mockito
.
doCallRealMethod
;
import
static
org
.
mockito
.
Mockito
.
doNothing
;
import
static
org
.
mockito
.
Mockito
.
spy
;
public
class
ExecutionGraphRestartTest
{
public
class
ExecutionGraphRestartTest
{
...
@@ -158,4 +162,141 @@ public class ExecutionGraphRestartTest {
...
@@ -158,4 +162,141 @@ public class ExecutionGraphRestartTest {
fail
(
"Failed to wait until all execution attempts left the state DEPLOYING."
);
fail
(
"Failed to wait until all execution attempts left the state DEPLOYING."
);
}
}
}
}
@Test
public
void
testCancelWhileRestarting
()
throws
Exception
{
Scheduler
scheduler
=
new
Scheduler
(
TestingUtils
.
defaultExecutionContext
());
Instance
instance
=
ExecutionGraphTestUtils
.
getInstance
(
new
SimpleActorGateway
(
TestingUtils
.
directExecutionContext
()),
NUM_TASKS
);
scheduler
.
newInstanceAvailable
(
instance
);
// Blocking program
ExecutionGraph
executionGraph
=
new
ExecutionGraph
(
TestingUtils
.
defaultExecutionContext
(),
new
JobID
(),
"TestJob"
,
new
Configuration
(),
AkkaUtils
.
getDefaultTimeout
());
JobVertex
jobVertex
=
new
JobVertex
(
"NoOpInvokable"
);
jobVertex
.
setInvokableClass
(
Tasks
.
NoOpInvokable
.
class
);
jobVertex
.
setParallelism
(
NUM_TASKS
);
JobGraph
jobGraph
=
new
JobGraph
(
"TestJob"
,
jobVertex
);
// We want to manually control the restart and delay
executionGraph
.
setNumberOfRetriesLeft
(
Integer
.
MAX_VALUE
);
executionGraph
.
setDelayBeforeRetrying
(
Integer
.
MAX_VALUE
);
executionGraph
.
attachJobGraph
(
jobGraph
.
getVerticesSortedTopologicallyFromSources
());
assertEquals
(
JobStatus
.
CREATED
,
executionGraph
.
getState
());
executionGraph
.
scheduleForExecution
(
scheduler
);
assertEquals
(
JobStatus
.
RUNNING
,
executionGraph
.
getState
());
// Kill the instance and wait for the job to restart
instance
.
markDead
();
Deadline
deadline
=
TestingUtils
.
TESTING_DURATION
().
fromNow
();
while
(
deadline
.
hasTimeLeft
()
&&
executionGraph
.
getState
()
!=
JobStatus
.
RESTARTING
)
{
Thread
.
sleep
(
100
);
}
assertEquals
(
JobStatus
.
RESTARTING
,
executionGraph
.
getState
());
// Canceling needs to abort the restart
executionGraph
.
cancel
();
assertEquals
(
JobStatus
.
CANCELED
,
executionGraph
.
getState
());
// The restart has been aborted
executionGraph
.
restart
();
assertEquals
(
JobStatus
.
CANCELED
,
executionGraph
.
getState
());
}
@Test
public
void
testCancelWhileFailing
()
throws
Exception
{
Scheduler
scheduler
=
new
Scheduler
(
TestingUtils
.
defaultExecutionContext
());
Instance
instance
=
ExecutionGraphTestUtils
.
getInstance
(
new
SimpleActorGateway
(
TestingUtils
.
directExecutionContext
()),
NUM_TASKS
);
scheduler
.
newInstanceAvailable
(
instance
);
// Blocking program
ExecutionGraph
executionGraph
=
new
ExecutionGraph
(
TestingUtils
.
defaultExecutionContext
(),
new
JobID
(),
"TestJob"
,
new
Configuration
(),
AkkaUtils
.
getDefaultTimeout
());
// Spy on the graph
executionGraph
=
spy
(
executionGraph
);
// Do nothing here, because we don't want to transition out of
// the FAILING state.
doNothing
().
when
(
executionGraph
).
jobVertexInFinalState
();
JobVertex
jobVertex
=
new
JobVertex
(
"NoOpInvokable"
);
jobVertex
.
setInvokableClass
(
Tasks
.
NoOpInvokable
.
class
);
jobVertex
.
setParallelism
(
NUM_TASKS
);
JobGraph
jobGraph
=
new
JobGraph
(
"TestJob"
,
jobVertex
);
// We want to manually control the restart and delay
executionGraph
.
setNumberOfRetriesLeft
(
Integer
.
MAX_VALUE
);
executionGraph
.
setDelayBeforeRetrying
(
Integer
.
MAX_VALUE
);
executionGraph
.
attachJobGraph
(
jobGraph
.
getVerticesSortedTopologicallyFromSources
());
assertEquals
(
JobStatus
.
CREATED
,
executionGraph
.
getState
());
executionGraph
.
scheduleForExecution
(
scheduler
);
assertEquals
(
JobStatus
.
RUNNING
,
executionGraph
.
getState
());
// Kill the instance...
instance
.
markDead
();
Deadline
deadline
=
TestingUtils
.
TESTING_DURATION
().
fromNow
();
// ...and wait for all vertices to be in state FAILED. The
// jobVertexInFinalState does nothing, that's why we don't wait on the
// job status.
boolean
success
=
false
;
while
(
deadline
.
hasTimeLeft
()
&&
!
success
)
{
success
=
true
;
for
(
ExecutionVertex
vertex
:
executionGraph
.
getAllExecutionVertices
())
{
if
(
vertex
.
getExecutionState
()
!=
ExecutionState
.
FAILED
)
{
success
=
false
;
Thread
.
sleep
(
100
);
break
;
}
}
}
// Still in failing
assertEquals
(
JobStatus
.
FAILING
,
executionGraph
.
getState
());
// The cancel call needs to change the state to CANCELLING
executionGraph
.
cancel
();
assertEquals
(
JobStatus
.
CANCELLING
,
executionGraph
.
getState
());
// Unspy and finalize the job state
doCallRealMethod
().
when
(
executionGraph
).
jobVertexInFinalState
();
executionGraph
.
jobVertexInFinalState
();
assertEquals
(
JobStatus
.
CANCELED
,
executionGraph
.
getState
());
}
}
}
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录