[FLINK-2779] Update documentation to reflect new Stream/Window API

c9088a49 · Kostas Tzoumas · Aljoscha Krettek · 9a21ab11 · c9088a49 · c9088a49
5 changed file
--- a/docs/_includes/navbar.html
+++ b/docs/_includes/navbar.html
@@ -76,22 +76,22 @@ under the License.
            <li class="dropdown{% if page.url contains '/apis/' %} active{% endif %}">
              <a href="{{ apis }}" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Programming Guides <span class="caret"></span></a>
              <ul class="dropdown-menu" role="menu">
-                <li><a href="{{ apis }}/programming_guide.html"><strong>Batch: DataSet API</strong></a></li>
-                <li><a href="{{ apis }}/streaming_guide.html"><strong>Streaming: DataStream API</strong> <span class="badge">Beta</span></a></li>
+                <li><a href="{{ apis }}/programming_guide.html"><strong>DataSet API</strong></a></li>
+                <li><a href="{{ apis }}/streaming_guide.html"><strong>DataStream API</strong></a></li>
                <li><a href="{{ apis }}/python.html">Python API <span class="badge">Beta</span></a></li>

                <li class="divider"></li>
                <li><a href="{{ apis }}/scala_shell.html">Interactive Scala Shell</a></li>
-                <li><a href="{{ apis }}/dataset_transformations.html">Dataset Transformations</a></li>
+                <li><a href="{{ apis }}/dataset_transformations.html">DataSet Transformations</a></li>
                <li><a href="{{ apis }}/best_practices.html">Best Practices</a></li>
-                <li><a href="{{ apis }}/example_connectors.html">Connectors (Batch)</a></li>
-                <li><a href="{{ apis }}/kafka.html">Kafka Connector <span class="badge">Beta</span></a></li>
+                <li><a href="{{ apis }}/example_connectors.html">Connectors (DataSet API)</a></li>
+                <!--<li><a href="{{ apis }}/kafka.html">Kafka Connector <span class="badge">Beta</span></a></li>-->
                <li><a href="{{ apis }}/examples.html">Examples</a></li>
                <li><a href="{{ apis }}/local_execution.html">Local Execution</a></li>
                <li><a href="{{ apis }}/cluster_execution.html">Cluster Execution</a></li>
                <li><a href="{{ apis }}/cli.html">Command Line Interface</a></li>
                <li><a href="{{ apis }}/web_client.html">Web Client</a></li>
-                <li><a href="{{ apis }}/iterations.html">Iterations</a></li>
+                <li><a href="{{ apis }}/iterations.html">Iterations (DataSet API)</a></li>
                <li><a href="{{ apis }}/java8.html">Java 8</a></li>
                <li><a href="{{ apis }}/hadoop_compatibility.html">Hadoop Compatibility <span class="badge">Beta</span></a></li>
                <li><a href="{{ apis }}/storm_compatibility.html">Storm Compatibility <span class="badge">Beta</span></a></li>

--- a/docs/apis/programming_guide.md
+++ b/docs/apis/programming_guide.md
 ---
-title: "Flink Programming Guide"
+title: "Flink DataSet API Programming Guide"
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -22,14 +22,14 @@ under the License.

 <a href="#top"></a>

-Analysis programs in Flink are regular programs that implement transformations on data sets
+DataSet programs in Flink are regular programs that implement transformations on data sets
 (e.g., filtering, mapping, joining, grouping). The data sets are initially created from certain
 sources (e.g., by reading files, or from local collections). Results are returned via sinks, which may for
 example write the data to (distributed) files, or to standard output (for example the command line
 terminal). Flink programs run in a variety of contexts, standalone, or embedded in other programs.
 The execution can happen in a local JVM, or on clusters of many machines.

-In order to create your own Flink program, we encourage you to start with the
+In order to create your own Flink DataSet program, we encourage you to start with the
 [program skeleton](#program-skeleton) and gradually add your own
 [transformations](#transformations). The remaining sections act as references for additional
 operations and advanced features.
@@ -221,7 +221,7 @@ Program Skeleton
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">

-As we already saw in the example, Flink programs look like regular Java
+As we already saw in the example, Flink DataSet programs look like regular Java
 programs with a `main()` method. Each program consists of the same basic parts:

 1. Obtain an `ExecutionEnvironment`,
@@ -233,7 +233,7 @@ programs with a `main()` method. Each program consists of the same basic parts:
 We will now give an overview of each of those steps, please refer to the respective sections for
 more details. Note that all core classes of the Java API are found in the package {% gh_link /flink-java/src/main/java/org/apache/flink/api/java "org.apache.flink.api.java" %}.

-The `ExecutionEnvironment` is the basis for all Flink programs. You can
+The `ExecutionEnvironment` is the basis for all Flink DataSet programs. You can
 obtain one using these static methods on class `ExecutionEnvironment`:

 {% highlight java %}
@@ -253,7 +253,7 @@ Typically, you only need to use `getExecutionEnvironment()`, since this
 will do the right thing depending on the context: if you are executing
 your program inside an IDE or as a regular Java program it will create
 a local environment that will execute your program on your local machine. If
-you created a JAR file from you program, and invoke it through the [command line](cli.html)
+you created a JAR file from your program, and invoke it through the [command line](cli.html)
 or the [web interface](web_client.html),
 the Flink cluster manager will execute your main method and `getExecutionEnvironment()` will return
 an execution environment for executing your program on a cluster.
@@ -276,7 +276,7 @@ more information on data sources and input formats, please refer to
 Once you have a DataSet you can apply transformations to create a new
 DataSet which you can then write to a file, transform again, or
 combine with other DataSets. You apply transformations by calling
-methods on DataSet with your own custom transformation function. For example,
+methods on DataSet with your own custom transformation functions. For example,
 a map transformation looks like this:

 {% highlight java %}
@@ -447,18 +447,14 @@ accessed from the `getLastJobExecutionResult()` method.
 DataSet abstraction
 ---------------

-The batch processing APIs of Flink are centered around the `DataSet` abstraction. A `DataSet` is only
-an abstract representation of a set of data that can contain duplicates.
-
-Also note that Flink is not always physically creating (materializing) each DataSet at runtime. This 
-depends on the used runtime, the configuration and optimizer decisions.
-
-The Flink runtime does not need to always materialize the DataSets because it is using a streaming runtime model.
-
-DataSets are only materialized to avoid distributed deadlocks (at points where the data flow graph branches out and joins again later) or if the execution mode has explicitly been set to a batched execution.
+A `DataSet` is an abstract representation of a finite immutable collection of data of the same type that may contain duplicates.

-When using Flink on Tez, all DataSets are materialized.
+Note that Flink is not always physically creating (materializing) each DataSet at runtime. This
+depends on the used runtime, the configuration and optimizer decisions. DataSets may be "streamed through" 
+operations during execution, as under the hood Flink uses a streaming data processing engine.

+Some DataSets are materialized automatically to avoid distributed deadlocks (at points where the data flow graph branches
+out and joins again later) or if the execution mode has explicitly been set to blocking execution.

 [Back to top](#top)

@@ -466,7 +462,7 @@ When using Flink on Tez, all DataSets are materialized.
 Lazy Evaluation
 ---------------

-All Flink programs are executed lazily: When the program's main method is executed, the data loading
+All Flink DataSet programs are executed lazily: When the program's main method is executed, the data loading
 and transformations do not happen directly. Rather, each operation is created and added to the
 program's plan. The operations are actually executed when the execution is explicitly triggered by 
 an `execute()` call on the ExecutionEnvironment object. Also, `collect()` and `print()` will trigger
@@ -1323,7 +1319,7 @@ data.map (new MyMapFunction());

 #### Anonymous classes

-You can pass a function as an anonmymous class:
+You can pass a function as an anonymous class:
 {% highlight java %}
 data.map(new MapFunction<String, Integer> () {
  public Integer map(String value) { return Integer.parseInt(value); }
@@ -1451,7 +1447,7 @@ for a complete example.
 Data Types
 ----------

-Flink places some restrictions on the type of elements that are used in DataSets and as results
+Flink places some restrictions on the type of elements that are used in DataSets and in results
 of transformations. The reason for this is that the system analyzes the types to determine
 efficient execution strategies.

@@ -1473,7 +1469,7 @@ Tuples are composite types that contain a fixed number of fields with various ty
 The Java API provides classes from `Tuple1` up to `Tuple25`. Every field of a tuple
 can be an arbitrary Flink type including further tuples, resulting in nested tuples. Fields of a
 tuple can be accessed directly using the field's name as `tuple.f4`, or using the generic getter method
-`tuple.getField(int position)`. The field indicies start at 0. Note that this stands in contrast
+`tuple.getField(int position)`. The field indices start at 0. Note that this stands in contrast
 to the Scala tuples, but it is more consistent with Java's general indexing.

 {% highlight java %}
@@ -2239,7 +2235,7 @@ result data. This section give some hints how to ease the development of Flink p
 ### Local Execution Environment

 A `LocalEnvironment` starts a Flink system within the same JVM process it was created in. If you
-start the LocalEnvironement from an IDE, you can set breakpoint in your code and easily debug your
+start the LocalEnvironement from an IDE, you can set breakpoints in your code and easily debug your
 program.

 A LocalEnvironment is created and used as follows:
@@ -2957,7 +2953,7 @@ public static final class Tokenizer extends RichFlatMapFunction<String, Tuple2<S

 [Back to top](#top)

-Program Packaging & Distributed Execution
+Program Packaging and Distributed Execution
 -----------------------------------------

 As described in the [program skeleton](#program-skeleton) section, Flink programs can be executed on

--- a/docs/apis/streaming_guide.md
+++ b/docs/apis/streaming_guide.md
--- a/docs/index.md
+++ b/docs/index.md
@@ -20,18 +20,17 @@ specific language governing permissions and limitations
 under the License.
 -->

-Apache Flink is a platform for efficient, distributed, general-purpose data processing.
-It features powerful programming abstractions in Java and Scala, a high-performance runtime, and
-automatic program optimization. It has native support for iterations, incremental iterations, and
-programs consisting of large DAGs of operations.
+Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is
+a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed
+computations over data streams. Flink also builds batch processing on top of the streaming engine, overlaying
+native iteration support, managed memory, and program optimization.

-If you quickly want to try out the system, please look at one of the available quickstarts. For
-a thorough introduction of the Flink API please refer to the
-[Programming Guide](apis/programming_guide.html).
+If you want to write your first program, look at one of the available quickstarts, and refer to the
+[DataSet API guide](apis/programming_guide.html) or the [DataStream API guide](apis/streaming_guide.html).

 ## Stack

-This is an overview of Flink's stack. Click on any component to go to the respective documentation.
+This is an overview of Flink's stack. Click on any component to go to the respective documentation page.

 <img src="fig/overview-stack-0.9.png" width="893" height="450" alt="Stack" usemap="#overview-stack">

@@ -62,13 +61,10 @@ This documentation is for Apache Flink version {{ site.version }}, which is the

 You can download the latest pre-built snapshot version from the [downloads]({{ site.download_url }}#latest) page of the [project website]({{ site.website_url }}).

-The Scala API uses Scala {{ site.scala_version }}. Please make sure to use a compatible version.
+<!--The Scala API uses Scala {{ site.scala_version }}. Please make sure to use a compatible version.

-Basically, the Scala API uses Scala 2.10. But you can use the API with Scala 2.11. To use Flink with
+The Scala API uses Scala 2.10, but you can use the API with Scala 2.11. To use Flink with
 Scala 2.11, please check [build guide](/setup/building.html#build-flink-for-scala-211)
-and [programming guide](/apis/programming_guide.html#scala-dependency-versions).
+and [programming guide](/apis/programming_guide.html#scala-dependency-versions).-->


-## Flink Architecture
-
-<img src="fig/process_model.svg" width="100%" alt="Flink Process Model">
--- a/docs/internals/general_arch.md
+++ b/docs/internals/general_arch.md
@@ -32,31 +32,53 @@ up within the same JVM.

 When a program is submitted, a client is created that performs the pre-processing and turns the program
 into the parallel data flow form that is executed by the JobManager and TaskManagers. The figure below
-illustrates the different actors in the system very coarsely.
+illustrates the different actors in the system and their interactions.

 <div style="text-align: center;">
-<img src="fig/ClientJmTm.svg" alt="The Interactions between Client, JobManager and TaskManager" height="400px" style="text-align: center;"/>
+<img src="../fig/process_model.svg" width="100%" alt="Flink Process Model">
 </div>

 ## Component Stack

-An alternative view on the system is given by the stack below. The different layers of the stack build on
+As a software stack, Flink is a layered system. The different layers of the stack build on
 top of each other and raise the abstraction level of the program representations they accept:

 - The **runtime** layer receives a program in the form of a *JobGraph*. A JobGraph is a generic parallel
 data flow with arbitrary tasks that consume and produce data streams.

- The **optimizer** and **common api** layer takes programs in the form of operator DAGs. The operators are
-specific (e.g., Map, Join, Filter, Reduce, ...), but are data type agnostic. The concrete types and their
-interaction with the runtime is specified by the higher layers.
+- Both the **DataStream API** and the **DataSet API** generate JobGraphs through separate compilation
+processes. The DataSet API uses an optimizer to determine the optimal plan for the program, while
+the DataStream API uses a stream builder.

- The **API layer** implements multiple APIs that create operator DAGs for their programs. Each API needs
-to provide utilities (serializers, comparators) that describe the interaction between its data types and
-the runtime.
+- The JobGraph is executed according to a variety of deployment options available in Flink (e.g., local,
+remote, YARN, etc)

-<div style="text-align: center;">
-<img src="fig/stack.svg" alt="The Flink component stack" width="800px" />
-</div>
+- Libraries and APIs that are bundled with Flink generate DataSet or DataStream API programs. These are
+Table for queries on logical tables, FlinkML for Machine Learning, and Gelly for graph processing.
+
+You can click on the components in the figure to learn more.
+
+<img src="../fig/overview-stack-0.9.png" width="893" height="450" alt="Stack" usemap="#overview-stack">
+
+<map name="overview-stack">
+  <area shape="rect" coords="188,0,263,200" alt="Graph API: Gelly" href="libs/gelly_guide.html">
+  <area shape="rect" coords="268,0,343,200" alt="Flink ML" href="libs/ml/">
+  <area shape="rect" coords="348,0,423,200" alt="Table" href="libs/table.html">
+
+  <area shape="rect" coords="188,205,538,260" alt="DataSet API (Java/Scala)" href="apis/programming_guide.html">
+  <area shape="rect" coords="543,205,893,260" alt="DataStream API (Java/Scala)" href="apis/streaming_guide.html">
+
+  <!-- <area shape="rect" coords="188,275,538,330" alt="Optimizer" href="optimizer.html"> -->
+  <!-- <area shape="rect" coords="543,275,893,330" alt="Stream Builder" href="streambuilder.html"> -->
+
+  <area shape="rect" coords="188,335,893,385" alt="Flink Runtime" href="internals/general_arch.html">
+
+  <area shape="rect" coords="188,405,328,455" alt="Local" href="apis/local_execution.html">
+  <area shape="rect" coords="333,405,473,455" alt="Remote" href="apis/cluster_execution.html">
+  <area shape="rect" coords="478,405,638,455" alt="Embedded" href="apis/local_execution.html">
+  <area shape="rect" coords="643,405,765,455" alt="YARN" href="setup/yarn_setup.html">
+  <area shape="rect" coords="770,405,893,455" alt="Tez" href="setup/flink_on_tez.html">
+</map>

 ## Projects and Dependencies