The following example programs showcase different applications of Flink from simple word counting to graph algorithms. The code samples illustrate the use of [Flink’s DataSet API](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/batch/index.html).
The full source code of the following and more examples can be found in the [flink-examples-batch](https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch) module of the Flink source repository.
In order to run a Flink example, we assume you have a running Flink instance available. The “Quickstart” and “Setup” tabs in the navigation describe various ways of starting Flink.
To run the WordCount example, issue the following command:
要运行WordCount示例,发出以下命令:
...
...
@@ -24,9 +24,9 @@ To run the WordCount example, issue the following command:
The other examples can be started in a similar way.
其他示例也可以以类似的方式开始。
Note that many examples run without passing any arguments for them, by using build-in data. To run WordCount with real data, you have to pass the path to the data:
@@ -36,11 +36,11 @@ Note that many examples run without passing any arguments for them, by using bui
Note that non-local file systems require a schema prefix, such as `hdfs://`.
注意,非本地文件系统需要一个模式前缀,如`hdfs://`。
## Word Count
WordCount is the “Hello World” of Big Data processing systems. It computes the frequency of words in a text collection. The algorithm works in two steps: First, the texts are splits the text to individual words. Second, the words are grouped and counted.
@@ -78,7 +78,7 @@ public static class Tokenizer implements FlatMapFunction<String, Tuple2<String,
The [WordCount example](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java) implements the above described algorithm with input parameters: `--input <path> --output <path>`. As test data, any text file will do.
The [WordCount example](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/wordcount/WordCount.scala) implements the above described algorithm with input parameters: `--input <path> --output <path>`. As test data, any text file will do.
The PageRank algorithm computes the “importance” of pages in a graph defined by links, which point from one pages to another page. It is an iterative graph algorithm, which means that it repeatedly applies the same computation. In each iteration, each page distributes its current rank over all its neighbors, and compute its new rank as a taxed sum of the ranks it received from its neighbors. The PageRank algorithm was popularized by the Google search engine which uses the importance of webpages to rank the results of search queries.
@@ -182,7 +182,7 @@ public static final class EpsilonFilter
The [PageRank program](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/graph/PageRank.java) implements the above example. It requires the following parameters to run: `--pages <path> --links <path> --output <path> --numPages <n> --iterations <n>`.
he [PageRank program](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/graph/PageRankBasic.scala) implements the above example. It requires the following parameters to run: `--pages <path> --links <path> --output <path> --numPages <n> --iterations <n>`.
For this simple implementation it is required that each page has at least one incoming and one outgoing link (a page can point to itself).
对于这个简单的实现,要求每个页面至少有一个传入和一个传出链接(页面可以指向自己)。
## Connected Components
## 连接组件
The Connected Components algorithm identifies parts of a larger graph which are connected by assigning all vertices in the same connected part the same component ID. Similar to PageRank, Connected Components is an iterative algorithm. In each step, each vertex propagates its current component ID to all its neighbors. A vertex accepts the component ID from a neighbor, if it is smaller than its own component ID.
This implementation uses a [delta iteration](iterations.html): Vertices that have not changed their component ID do not participate in the next step. This yields much better performance, because the later iterations typically deal only with a few outlier vertices.
@@ -336,7 +336,7 @@ public static final class ComponentIdFilter
The [ConnectedComponents program](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/graph/ConnectedComponents.java) implements the above example. It requires the following parameters to run: `--vertices <path> --edges <path> --output <path> --iterations <n>`.
The [ConnectedComponents program](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/graph/ConnectedComponents.scala) implements the above example. It requires the following parameters to run: `--vertices <path> --edges <path> --output <path> --iterations <n>`.