未验证 提交 c4462d6a 编写于 作者: 片刻小哥哥's avatar 片刻小哥哥 提交者: GitHub

Merge pull request #34 from zyBourn/feature/flink_1.7_doc_zh_8

完成8;
# Batch Examples
# 批处理的例子
The following example programs showcase different applications of Flink from simple word counting to graph algorithms. The code samples illustrate the use of [Flink’s DataSet API](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/batch/index.html).
下面的示例程序展示了Flink的不同应用程序,从简单的单词计数到图形算法。代码示例演示了[Flink’s DataSet API](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/batch/index.html)的使用。
The full source code of the following and more examples can be found in the [flink-examples-batch](https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch) module of the Flink source repository.
以下示例的完整源代码和更多示例可以在Flink源代码存储库的[flink-examples-batch](https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch)中找到。
## Running an example
## 运行一个例子
In order to run a Flink example, we assume you have a running Flink instance available. The “Quickstart” and “Setup” tabs in the navigation describe various ways of starting Flink.
为了运行Flink示例,我们假设您有一个正在运行的Flink实例可用。导航中的“快速启动”和“设置”选项卡描述了启动Flink的各种方法。
The easiest way is running the `./bin/start-cluster.sh`, which by default starts a local cluster with one JobManager and one TaskManager.
最简单的方法是运行`./bin/start-cluster.sh`。默认情况下,它使用一个JobManager和一个TaskManager启动本地集群。
Each binary release of Flink contains an `examples` directory with jar files for each of the examples on this page.
Flink的每个二进制版本都包含一个`examples`目录,其中包含用于该页上每个示例的jar文件。
To run the WordCount example, issue the following command:
要运行WordCount示例,发出以下命令:
......@@ -24,9 +24,9 @@ To run the WordCount example, issue the following command:
The other examples can be started in a similar way.
其他示例也可以以类似的方式开始。
Note that many examples run without passing any arguments for them, by using build-in data. To run WordCount with real data, you have to pass the path to the data:
注意,通过使用内置数据,许多示例在运行时不传递任何参数。要使用真实数据运行WordCount,必须将路径传递给数据:
......@@ -36,11 +36,11 @@ Note that many examples run without passing any arguments for them, by using bui
Note that non-local file systems require a schema prefix, such as `hdfs://`.
注意,非本地文件系统需要一个模式前缀,如`hdfs://`
## Word Count
WordCount is the “Hello World” of Big Data processing systems. It computes the frequency of words in a text collection. The algorithm works in two steps: First, the texts are splits the text to individual words. Second, the words are grouped and counted.
WordCount是大数据处理系统的“Hello World”。它计算文本集合中单词的频率。该算法分为两个步骤:首先,将文本拆分为单个单词。其次,对单词进行分组和计数。
......@@ -78,7 +78,7 @@ public static class Tokenizer implements FlatMapFunction<String, Tuple2<String,
The [WordCount example](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java) implements the above described algorithm with input parameters: `--input &lt;path&gt; --output &lt;path&gt;`. As test data, any text file will do.
[WordCount example](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java) 使用输入参数实现了上述算法:`--input &lt;path&gt; --output &lt;path&gt;`。作为测试数据,任何文本文件都可以。
......@@ -97,13 +97,13 @@ counts.writeAsCsv(outputPath, "\n", " ")
The [WordCount example](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/wordcount/WordCount.scala) implements the above described algorithm with input parameters: `--input &lt;path&gt; --output &lt;path&gt;`. As test data, any text file will do.
[WordCount example](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/wordcount/WordCount.scala) 使用输入参数实现了上述算法:`--input &lt;path&gt; --output &lt;path&gt;`。作为测试数据,任何文本文件都可以。
## Page Rank
## 网页排名
The PageRank algorithm computes the “importance” of pages in a graph defined by links, which point from one pages to another page. It is an iterative graph algorithm, which means that it repeatedly applies the same computation. In each iteration, each page distributes its current rank over all its neighbors, and compute its new rank as a taxed sum of the ranks it received from its neighbors. The PageRank algorithm was popularized by the Google search engine which uses the importance of webpages to rank the results of search queries.
PageRank算法计算链接定义的图中页面的“重要性”,链接从一个页面指向另一个页面。它是一种迭代图算法,即重复应用相同的计算。在每次迭代中,每个页面都将其当前的秩分布到所有相邻的页面上,并将其新秩计算为从相邻页面获得的秩的累加和。PageRank算法是由谷歌搜索引擎推广的,它利用网页的重要性对搜索查询结果进行排序。
In this simple example, PageRank is implemented with a [bulk iteration](iterations.html) and a fixed number of iterations.
在这个简单的例子中,PageRank是通过[bulk iteration](iterations.html)和固定数量的迭代来实现的。
......@@ -182,7 +182,7 @@ public static final class EpsilonFilter
The [PageRank program](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/graph/PageRank.java) implements the above example. It requires the following parameters to run: `--pages &lt;path&gt; --links &lt;path&gt; --output &lt;path&gt; --numPages &lt;n&gt; --iterations &lt;n&gt;`.
[PageRank program](https://github.com/apache/flink/blob/master/flink-examples/flink-exampls-batch/src/main/java/org/apache/flink/examples/java/graph/pagerank.java)实现了上述示例。它需要以下参数来运行:`--pages &lt;path&gt; --links &lt;path&gt; --output &lt;path&gt; --numPages &lt;n&gt; --iterations &lt;n&gt;`
......@@ -241,22 +241,22 @@ val result = finalRanks
he [PageRank program](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/graph/PageRankBasic.scala) implements the above example. It requires the following parameters to run: `--pages &lt;path&gt; --links &lt;path&gt; --output &lt;path&gt; --numPages &lt;n&gt; --iterations &lt;n&gt;`.
[PageRank程序](https://github.com/apache/flink/blob/master/flink-examples/flink-exampls-batch/src/main/scala/org/apache/flink/examples/scala/graph/pagerankbasic.scala)实现了上述示例。它需要以下参数来运行:`--pages &lt;path&gt; --links &lt;path&gt; --output &lt;path&gt; --numPages &lt;n&gt; --iterations &lt;n&gt;`
Input files are plain text files and must be formatted as follows:
输入文件是纯文本文件,必须格式化如下:
* Pages represented as an (long) ID separated by new-line characters.
* For example `"1\n2\n12\n42\n63\n"` gives five pages with IDs 1, 2, 12, 42, and 63.
* Links are represented as pairs of page IDs which are separated by space characters. Links are separated by new-line characters:
* For example `"1 2\n2 12\n1 12\n42 63\n"` gives four (directed) links (1)-&gt;(2), (2)-&gt;(12), (1)-&gt;(12), and (42)-&gt;(63).
* 页面表示为由新行字符分隔的(长)ID。
* 例如`"1\n2\n12\n42\n63\n"` 给出了5页的IDs 1、2、12、42和63。
* 链接表示为由空格字符分隔的页面id对。链接用换行符分隔:
* 例如`"1 2\n2 12\n1 12\n42 63\n"` 给四(导演)链接(1)→(2),(2)→(12),(1)→(12)和(42)→(63)。
For this simple implementation it is required that each page has at least one incoming and one outgoing link (a page can point to itself).
对于这个简单的实现,要求每个页面至少有一个传入和一个传出链接(页面可以指向自己)。
## Connected Components
## 连接组件
The Connected Components algorithm identifies parts of a larger graph which are connected by assigning all vertices in the same connected part the same component ID. Similar to PageRank, Connected Components is an iterative algorithm. In each step, each vertex propagates its current component ID to all its neighbors. A vertex accepts the component ID from a neighbor, if it is smaller than its own component ID.
连通分量算法通过将同一连通部分中的所有顶点分配给相同的分量ID来识别一个较大图中的连通部分。与PageRank类似,连通分量是一种迭代算法。在每一步中,每个顶点将其当前的组件ID传播到它的所有邻居。如果一个顶点的组件ID小于它自己的组件ID,那么它接受来自邻居的组件ID。
This implementation uses a [delta iteration](iterations.html): Vertices that have not changed their component ID do not participate in the next step. This yields much better performance, because the later iterations typically deal only with a few outlier vertices.
此实现使用[delta iteration](iterations.html):没有更改其组件ID的顶点不参与下一步。这将产生更好的性能,因为后面的迭代通常只处理少数离群点。
......@@ -336,7 +336,7 @@ public static final class ComponentIdFilter
The [ConnectedComponents program](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/graph/ConnectedComponents.java) implements the above example. It requires the following parameters to run: `--vertices &lt;path&gt; --edges &lt;path&gt; --output &lt;path&gt; --iterations &lt;n&gt;`.
[ConnectedComponents program](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/graph/ConnectedComponents.java) 实现了上述示例。它需要以下参数来运行:`--vertices &lt;path&gt; --edges &lt;path&gt; --output &lt;path&gt; --iterations &lt;n&gt;`
......@@ -375,12 +375,12 @@ verticesWithComponents.writeAsCsv(outputPath, "\n", " ")
The [ConnectedComponents program](https://github.com/apache/flink/blob/master//flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/graph/ConnectedComponents.scala) implements the above example. It requires the following parameters to run: `--vertices &lt;path&gt; --edges &lt;path&gt; --output &lt;path&gt; --iterations &lt;n&gt;`.
[ConnectedComponents程序](https://github.com/apache/flink/blob/flink-examples/flink-exampls-batch/src/main/scala/org/apache/flink/examples/scala/graph/connectedcomponents.scala)实现了上述示例。它需要以下参数来运行:`--vertices &lt;path&gt; --edges &lt;path&gt; --output &lt;path&gt; --iterations &lt;n&gt;`
Input files are plain text files and must be formatted as follows:
输入文件是纯文本文件,必须格式化如下:
* Vertices represented as IDs and separated by new-line characters.
* For example `"1\n2\n12\n42\n63\n"` gives five vertices with (1), (2), (12), (42), and (63).
* Edges are represented as pairs for vertex IDs which are separated by space characters. Edges are separated by new-line characters:
* For example `"1 2\n2 12\n1 12\n42 63\n"` gives four (undirected) links (1)-(2), (2)-(12), (1)-(12), and (42)-(63).
* 顶点表示为id,用换行符分隔。
* 例如`"1\n2\n12\n42\n63\n"` 给了5个顶点(1),(2),(12),(42)和(63)。
* 边缘表示为顶点id的对,顶点id由空间字符分隔。边缘用换行符分隔:
* 例如`"1 2\n2 12\n1 12\n42 63\n"`给出4个(无向)链路(1)-(2),(2)-(12),(1)-(12),和(42)-(63)。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册