From 4895131e8e00729a10e4b8586a2d158b181c3070 Mon Sep 17 00:00:00 2001 From: Stephan Ewen Date: Mon, 5 Jan 2015 00:59:16 +0100 Subject: [PATCH] [FLINK-1335] [docs] Update internals with brief description about JobManager data structures and scheduling. This closes #281 --- docs/_includes/sidenav.html | 86 +- docs/css/main/main.css | 20 +- docs/img/ClientJmTm.svg | 526 ++++++----- docs/img/apache-incubator-logo.png | Bin 4234 -> 0 bytes docs/img/job_and_execution_graph.svg | 851 ++++++++++++++++++ docs/img/jobgraph_executiongraph.svg | 472 ---------- docs/img/slots.svg | 505 +++++++++++ docs/img/state_machine.svg | 488 +++++----- docs/internal_add_operator.md | 5 + docs/internal_distributed_akka.md | 6 + ...internal_overview.md => internal_howto.md} | 35 +- docs/internal_job_scheduling.md | 61 ++ docs/internal_logging.md | 9 +- 13 files changed, 2082 insertions(+), 982 deletions(-) mode change 100644 => 100755 docs/img/ClientJmTm.svg delete mode 100644 docs/img/apache-incubator-logo.png create mode 100755 docs/img/job_and_execution_graph.svg delete mode 100644 docs/img/jobgraph_executiongraph.svg create mode 100755 docs/img/slots.svg mode change 100644 => 100755 docs/img/state_machine.svg rename docs/{internal_overview.md => internal_howto.md} (66%) diff --git a/docs/_includes/sidenav.html b/docs/_includes/sidenav.html index fdddabdfe15..8a6585ff68a 100644 --- a/docs/_includes/sidenav.html +++ b/docs/_includes/sidenav.html @@ -17,57 +17,47 @@ specific language governing permissions and limitations under the License. --> diff --git a/docs/css/main/main.css b/docs/css/main/main.css index f30e402e96d..5c744757471 100755 --- a/docs/css/main/main.css +++ b/docs/css/main/main.css @@ -274,6 +274,24 @@ ul { font-size: 0.8em; } +/*---------------------------------------------------------------------- + Side navigation +----------------------------------------------------------------------*/ + +.sidenav-category { + font-weight: bold; + font-size: larger; + margin-top: 10px; +} + +.sidenav-item { + border-top: thin solid #AAAAAA; +} + +.sidenav-item-bottom { + border-bottom: thin solid #AAAAAA; border-top: thin solid #AAAAAA; +} + /*---------------------------------------------------------------------- Responsive CSS 768px / 992px / 1200px ----------------------------------------------------------------------*/ @@ -386,4 +404,4 @@ ul { .af-main-nav ul li.active a { background: #FA5F00; } -} \ No newline at end of file +} diff --git a/docs/img/ClientJmTm.svg b/docs/img/ClientJmTm.svg old mode 100644 new mode 100755 index 99bd209018c..b158b7d8388 --- a/docs/img/ClientJmTm.svg +++ b/docs/img/ClientJmTm.svg @@ -18,6 +18,8 @@ specific language governing permissions and limitations under the License. --> + + + version="1.1" + inkscape:version="0.48.5 r10040"> + + + id="metadata7"> @@ -44,221 +69,280 @@ under the License. - - - - - JobManager - - - - Client - - - - - - - - - - - Program - Submit Job - Compiler/ - Optimizer - Scheduling, - Resource Management - - - - TaskManager - - - - TaskManager - - - - - - - Task Execution, - Data Exchange - Task Execution, - Data Exchange + inkscape:label="Layer 1" + inkscape:groupmode="layer" + id="layer1" + transform="translate(0.17493185,-2.7660971)"> + + + + JobManager + + + + Client + + + + + + + + + + + Submit Job + Compiler/ + Optimizer + Scheduling, + Resource Management + + + + TaskManager + + + + TaskManager + + + + + + + Task Execution, + Data Exchange + Task Execution, + Data Exchange + + Exchange + Intermediate + Results + ( + shuffle + / + broadcast + ) + diff --git a/docs/img/apache-incubator-logo.png b/docs/img/apache-incubator-logo.png deleted file mode 100644 index 81fb31ec7128939673f04d4748788ac4989c0d17..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 4234 zcmV;55Owc~P)F3_w-5MGi+uGaL*456=&T?{c z($dn##>T_K!G3;z?(XiszP`4$wk9SfdU|>a3JRK;iMn;B?mP%S;j-ICK(ALzUsmy$Z)|QsyvbuST zmG0Wsxeha&6H43yExraY<^(XTA2QZTR;Els*{h>sZ*;D7ma+*#nh-CP2|VrtFRKwn zx-B%;J~Ey=F~)vs&a9eOX>Do{N_`4QvIsEB1}XUiFs>RZ!DClCNllh=reYXaf(m1X z5lf&7Q;rcwq6jtD2QRudIMz)zk}fvHY+zzadQuv0LJDPP5P@^W zL#h%m#u6>VP&%$UF1tWC3IYr&AUZ-ITsjp_OAdl&3R{i}Vv-6`zFaoWiiUbxhE5h* zN+DYk0~<0LIyDMXPYQ8b4tmyNQmkxJzK&|nvW|uhVpbnkLK9$W5@N~+Hr5j^xfC*i0N+@7j5nIMKHohK8s}NeACR%&a(biLe@Gm#;Ri0&T{U0Le`2xuAWMcf;z^w zPM(%d-Yzn>f=aQRLf+O=rk+x|t4PARhB_unQUW5Rlt8woO4hPc_TpNeiavT~GOUS0 zqkct^b~>_+QsTN&l883uLNdd-ZMLp*nw}1900001bW%=J06^y0W&i*H32;bRa{vGf z4gdfP4go;E;XwcZ00(qQO+^RO3KtFz883Dl;s5{*97#k$RCt{2T6=s`)g3qICb?;7 z(l)p4ZJUtL-iGu=-!ED!h+0b;pCy5z(4@#(YFl)cQVI+#l(GSuiVs#hox<>eqC*j1 zlZkt%L&ZJZdlTH8xasC&y2n2|ulr2erVn*v`+Yu1?&EjQIp6a=zwh~-b8`;(U$ArF zM{hWr%|uWd+5adrPj`WCgwizKrB22i(m_tzv=^?2D4A(5HHsM-h8b(j2p)>FAaTLD z?0E~%zu>~@tDKo6$~ZXIbrTs{Y4Cxfu^dW);6}d->g!7{YFOlL^j&<(rI$^jX;EyF zD9t(~Tr(!{1VULjrbo5`R3)?Q<@NRTS2Q&(_WA=$e2q)Lu0;X4s=W$RSe}#G!cPpg$^wGs@RF?W0Z)0M^P?%fHy#+P1p=nvS4% zvDeoW=4?xN{M zY2#g3^33t^Hpi^YtR$dUcL8iN{0QizOiD0xPJ$U*4J6ub74Kk_JUKy~ks!L2s_M-t zJ$Jcj)fR9=|JH5Y+XmX!Z693S>-R0`3Lg-7rkc3AhX@GJt{Al zP4KJ>eX$f5PB4H4o}pomAORN5Qvfg=$H0g2C6j@Ap>7g5&P0olEVF+3{Q0ZCx^vfV zaAW^X;c$3psHdl?qq(EAvoYx15b|~HSho4*tM^=X%dIn~iqgT0Aame&&CD{obMw-H z9Gf;wBSL4SC|S~QsmUJ95<-Sk4$gv)DLh!X3xDG??@svb&g-lE%94BftK3)tPi zH+=oJ*6m+wYaYJF-x+LdY`9~j1%7>f^PNkV-t~>EVnrGiB~OQ^9R=mA>@0U~;*7L> z5``>noSPMgEJevu{@7VX3{wY>6R?MfZvk+_US@-hG2TkkRJeYa09foe%rh`vbv^k9@1G>Cywsy3U^jO2%cAKqBQa&Q8or zgA)qxq7L>savM`+lt^1JOTfShn4~zxfaEA`7^NG@tHHpa6@xkVIIfuO?Eup}p zj~!WiWbN904?ezi3jjx-*x9{*aA2UVeShzh13jzP`+eSn;hv>WwR=yzy65Gb38J*{ zG@*n+>2MAJAHy&wi9^AJr>>qwG><_doZV0U0 z^xNOPaq9G&Z$>3G0;L5IDv7*i8)xU{!JtenEG;Xy8t@%7+=x7xunHky41*>OdLl@( zKgt~R3}fIKHJ7+c;6v~Zh`!FUKX=RV=Y7L1-|ZU-ELz|BaQpXq+P~j^@Ywdf;U^#J z|H0PXyTO*-FFbbp6Rp8ekH4d#<@bMh>y1CYefrFqccSnafsz*}lQ~FUq4FS*DTSq$ za*tJqZwX{hp!5lfte}Q;0X4{6j*cGdG+eSplG~`st^LK9*8FgTuc2k+#IUb1IK0H) zw5X}w|LAS2+YW}04c*k;edths|Kt6CI(qca*Dibgm3QBJ_b+Gudiu=!?~IGmP38+x zdEjt`%9EN}WGPE1r~m{dlntU!4tPZ7O@oV4#d7FKaKk%Dcg$rt-_fVA;uZLVyU1CB zG3FOIM?;C6UVi1(J8oWc;_#jKw+uIUM?#$)!%I%|EopjqyMJI?_h9eN@ctFu;hupP zZvWuJk3Rexv=2Y{=;MfKREttiHV*6vWOf!BuQ(_zh|AnDnmRQ*dH#Xd zj(6_)(Z-N3=naJeBfelDFcO3dmKANyA>aPyM*sFfe_wO^z3+bV*6EKwdH>Y=r(V<+ zqCx2fXeDb#@`{*5L5V?@m3u0S&qfimhgI|=aR z;<)$>P!e-GowI)O)7SU>tn2vkhM#v`y!3FxhPyUzSlRWDyPv=7%9l2eeDvl&|8+9j zlNywmEf|z&qQr?EN95I0P+446EvC2e8351)2|G2EaYTX2V{)ch6<(}4F1(Pno&3ct zbZmcaynI%Rk`t8&fy|we#-$>8P4GY_)04!Ulc?&PDJ*M6qbL(%a;7LJv9GE|i4@js zr}9b@J1nm-DATJMF)ph~b(s^dPHU6l(mnBN)F?SX<&|WwGZ4tqGAOSC0y!5@w=8jX zoU(l`Tv~WD&0dl6q#@<8VC9K}QT_Q>BAjnD_ zl!l6YtK?|o9MMXfuuX=`DUVo_0N2e9DY2V}l?Tf!1X-dhnRxl*R*x;s$l1nso&c9d zPUWRW2~xzI2PuyP5(TB9q((>ABsfclmPk(>_?%}f%7Wu78Jxt{W{DLu#9&;I1B;B3 zBN@M{Wwtm5tz!+0#2rOh!!Tm77g3L|uexXi(0ZDak8ZQd*Gm6vtInGZiIu zbEVX$R)vxiQ*uxv6N1xOV%6j-M^7baj0UA*3WipcAr%l8uR;m%+8y<7g>=CxcM#nj ztQ#Grse;HWTvDc`6c!SBEhbFP<#m#q5@iaIO-K9PT%ouKthkkFF`%T2PB;?u$^u+j z8kBe)lRQW1Q$X7$h&?_!O2$CtRgu`m;h?OWtC#XS6_@ckWnHxFN}#feSh*AzuA(YQ z{bY{hGHZ_7hm7JZ@pKF>1%R_xl-SA2gXFaiQYp?yfLZMT*O;_F``6c z)|?Nt=G1a2M7W@!c1mYsaOno;8qu~SS#6=9tfE1w=Yc#!D!3HeZUA!A#cRN15|11) zqeQtf2O`X<0xMV@OCb7fvS~J&Vno}ivDY%rxk+w|F%H0fG#*zOo zpe?L1QMw%t@hL%m*CNPQ(`1x21~`*AWN(MqmvVV7VXXO*b zGEyVts*1^xl~vi8C>5nLKJNW)iQh>7CZP{JB& z$c`OM?2-&6bv341od)E1T(bIz7TDzhCmB}3qzKn+RZ^ftX_j(hh@vp!!OGaVM4?hc zHnY+h9F$gF#0KR!T(X(k2H^KvnuLbrDh^GaM)Fa4%2%S4U(-V_Bgn+%#mXfLD=StW zBCmXgE@Ig_9+z@+h|$+usuUjNk&7O+tCjriL}E>aa;6x82!&JAKX_{|W-f79)K*~S zG4XShtD#uAWX*NT`s?WXnM`HoCyBPIM_jZ=Z6Zt+N@=ZwxYVpk$fio{TtX|N|K + + + + + + + + + + + image/svg+xml + + + + + + + + + + Intermediate + Result + + + Execution + Job Vertex + + + Execution + Job Vertex + + + Execution + Job Vertex + + JobGraph + + + JobVertex + (A) + + + JobVertex + (D) + + + JobVertex + (B) + + + + Intermediate + Data Set + + + Intermediate + Data Set + + + JobVertex + (C) + + + + + + + + Intermediate + Data Set + + ExecutionGraph + + + Execution + Vertex + A (0/2) + + + Execution + Vertex + A (1/2) + + + Intermediate + Result + + + Intermediate + Result + Partition + + + Intermediate + Result + Partition + + + + Execution + Vertex + B (0/2) + + + Execution + Vertex + B (1/2) + + + + + + Intermediate + Result + + + Intermediate + Result + Partition + + + Intermediate + Result + Partition + + + + + + + Execution + Vertex + D (0/2) + + + Execution + Vertex + D (1/2) + + + + + + + + + + + Intermediate + Result + Partition + + + Intermediate + Result + Partition + + + + + + diff --git a/docs/img/jobgraph_executiongraph.svg b/docs/img/jobgraph_executiongraph.svg deleted file mode 100644 index e95126fafcf..00000000000 --- a/docs/img/jobgraph_executiongraph.svg +++ /dev/null @@ -1,472 +0,0 @@ - - - - - - - - image/svg+xml - - - - - - - - - - JobGraph - - - ExecutionGraph - - - JobInput - Vertex - - JobTask - Vertex - - JobOutput - Vertex - - JobTask - Vertex - - - - Execution - Vertex - - Execution - Vertex - - Execution - Vertex - - Execution - Vertex - - Execution - Vertex - - Execution - Vertex - - Execution - Vertex - - Execution - Vertex - - - - - - - - - - - - - - - - ExecutionGroup - Vertex - ExecutionGroup - Vertex - ExecutionGroup - Vertex - Execution - Group - Vertex - Source - Map - Reduce - Sink - - diff --git a/docs/img/slots.svg b/docs/img/slots.svg new file mode 100755 index 00000000000..7d5dc2a3a72 --- /dev/null +++ b/docs/img/slots.svg @@ -0,0 +1,505 @@ + + + + + + + + + + + + image/svg+xml + + + + + + + + + TaskManager + 1 + + + + + + + + TaskManager + 2 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + TaskManager + 1 + + + + + + + + TaskManager + 2 + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/img/state_machine.svg b/docs/img/state_machine.svg old mode 100644 new mode 100755 index 9bb711e9f57..8d0f5703796 --- a/docs/img/state_machine.svg +++ b/docs/img/state_machine.svg @@ -26,12 +26,39 @@ under the License. xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:svg="http://www.w3.org/2000/svg" xmlns="http://www.w3.org/2000/svg" - version="1.0" - width="230.41mm" - height="154.87mm" - id="svg2985"> + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd" + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape" + width="664.92505" + height="445.67966" + id="svg2" + version="1.1" + inkscape:version="0.48.5 r10040"> + + + id="metadata7"> @@ -42,213 +69,250 @@ under the License. - - - - CREATED - - - SCHE - - - DULED - - - CANCEL - - - ED - - - FAILED - - - - - DEPLOY - - - ING - - - FINISHED - - - RUNNING - - - - - - - CANCEL - - - ING - - - - - + inkscape:label="Layer 1" + inkscape:groupmode="layer" + id="layer1" + transform="translate(-42.537478,-309.52235)"> + + + + CREATED + + + SCHE + - + DULED + + + CANCEL + - + ED + + + FAILED + + + + + DEPLOY + - + ING + + + FINISHED + + + RUNNING + + + + + + + CANCEL + - + ING + + + + + + + diff --git a/docs/internal_add_operator.md b/docs/internal_add_operator.md index e370c148277..ddcb6b5c8d0 100644 --- a/docs/internal_add_operator.md +++ b/docs/internal_add_operator.md @@ -243,4 +243,9 @@ public DataSet mapPartition(MapPartitionFunction function) { } ~~~ +--- + +*This documentation is maintained by the contributors of the individual components. +We kindly ask anyone that adds and changes components to eventually provide a patch +or pull request that updates these documents as well.* diff --git a/docs/internal_distributed_akka.md b/docs/internal_distributed_akka.md index f0bf8e0c2d9..49497b25a4b 100644 --- a/docs/internal_distributed_akka.md +++ b/docs/internal_distributed_akka.md @@ -39,3 +39,9 @@ via Akka (http://akka.io) ## Failure Detection + +--- + +*This documentation is maintained by the contributors of the individual components. +We kindly ask anyone that adds and changes components to eventually provide a patch +or pull request that updates these documents as well.* diff --git a/docs/internal_overview.md b/docs/internal_howto.md similarity index 66% rename from docs/internal_overview.md rename to docs/internal_howto.md index 88fe6f6b8ab..415bda72604 100644 --- a/docs/internal_overview.md +++ b/docs/internal_howto.md @@ -20,39 +20,20 @@ specific language governing permissions and limitations under the License. --> -This documentation provides an overview of the architecture of the Flink system -and its components. It is intended as guide to contributors, and people +This documentation provides an overview of the "How-To's" for +Flink developers. It is intended as guide to contributors and people that are interested in the technology behind Flink. -*This documentation is maintained by the contributors of the individual components. -We kindly ask anyone that adds and changes components to eventually provide a patch -or pull request that updates these documents as well.* - ## Architectures and Components -- [General Architecture and Process Model](internal_general_arch.html) - - - -- [Distributed Communication via Akka](internal_distributed_akka.html) - - +- [How-to: Using logging in Flink](internal_logging.html) -- [How-to: Adding a new Operator](internal_add_operator.html) +--- - +*This documentation is maintained by the contributors of the individual components. +We kindly ask anyone that adds and changes components to eventually provide a patch +or pull request that updates these documents as well.* -- [How-to: Using logging in Flink](internal_logging.html) diff --git a/docs/internal_job_scheduling.md b/docs/internal_job_scheduling.md index 1237c85911b..7fb45001423 100644 --- a/docs/internal_job_scheduling.md +++ b/docs/internal_job_scheduling.md @@ -19,3 +19,64 @@ KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> + +This document briefly describes how Flink schedules jobs and +how it represents and tracks job status on the JobManager. + +* This will be replaced by the TOC +{:toc} + + +## Scheduling + +Execution resources in Flink are defined through _Task Slots_. Each TaskManager will have one or more task slots, +each of which can run one pipeline of parallel tasks. A pipeline consists of multiple successive tasks, such as the +*n-th* parallel instance of a MapFunction together with the *n-th* parallel instance of a ReduceFunction. +Note that Flink often executes successive tasks concurrently: For Streaming programs, that happens in any case, +but also for batch programs, it happens frequently. + +The figure below illustrates that. Consider a program with a data source, a *MapFunction*, and a *ReduceFunctoin*. +The source and MapFunction are executed with a parallelism of 4, while the ReduceFunction is executed with a +parallism of 3. A pipeline consists of the sequence Source - Map - Reduce. On a cluster with 2 TaskManagers with +3 slots each, the program will be executed as described below. + +
+Assigning Pipelines of Tasks to Slots +
+ +Internally, Flink defines through [SlotSharingGroup](https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/SlotSharingGroup.java) +and [CoLocationGroup](https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/CoLocationGroup.java) +which tasks may share a slot (permissive), respectively which tasks must be strictly placed into the same slot. + + +## JobManager Data Structures + +During job execution, the JobManager keeps track of distributed tasks, decides when to schedule the next task (or set of tasks), +and reacts to finished tasks or execution failures. + +The JobManager receives the [JobGraph](https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/), +which is a representation of the data flow consisting of operators ([JobVertex](https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/AbstractJobVertex.java)) +and intermediate results ([IntermediateDataSet](https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/IntermediateDataSet.java)). +Each operator has properies, like the degree of parallelism and the code that it executes. +In addition, the JobGraph has a set of attached libraries, that are neccessary to execute the code of the operators. + +The JobManager transforms the JobGraph into an [ExecutionGraph](https://github.com/apache/incubator-flink/tree/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph). +The ExecutionGraph is a parallel version of the JobGraph: For each JobVertex, it contains an [ExecutionVertex](https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java) per parallel subtask. An operator with a parallelism of 100 will have one JobVertex and 100 ExecutionVertices. +The ExecutionVertex tracks the state of execution of a particular subtask. All ExecutionVertices from one JobVertex are held in an +[ExecutionJobVertex](https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionJobVertex.java), +which tracks the status of the operator as a whole. +Besides the vertices, the ExecutionGraph also contains the [IntermediateResult](https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/IntermediateResult.java) and the [IntermediateResultPartition](https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/IntermediateResultPartition.java). The former tracks the state of the *IntermediateDataSet*, the latter the state of each of its partitions. + +
+JobGraph and ExecutionGraph +
+ +During its execution, each parallel task goes through multiple stages, from *created* to *finished* or *failed*. The diagram below illustrates the +states and possible transitions between them. A task may be executed multiple times (for example in the course of failure recovery). +For that reason, the execution of an ExecutionVertex is tracked in an [Execution](https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java). Each ExecutionVertex has a current Execution, and prior Executions. + +
+States and Transitions of Task Executions +
+ + diff --git a/docs/internal_logging.md b/docs/internal_logging.md index 79e3656149e..42fd3171f42 100644 --- a/docs/internal_logging.md +++ b/docs/internal_logging.md @@ -86,4 +86,11 @@ Placeholders can also be used in conjunction with exceptions which shall be logg catch(Exception exception){ LOG.error("An {} occurred.", "error", exception); } -~~~ \ No newline at end of file +~~~ + +--- + +*This documentation is maintained by the contributors of the individual components. +We kindly ask anyone that adds and changes components to eventually provide a patch +or pull request that updates these documents as well.* + -- GitLab