Fix markdown syntax

6f1fc7f9 · 取昵称好难啊 · d666601d · 6f1fc7f9 · 6f1fc7f9
隐藏空白更改
内联并排

Showing with 11 addition and 22 deletion

docs/20.md docs/20.md +3 -7

docs/6.md docs/6.md +8 -15

未找到文件。
--- a/docs/20.md
+++ b/docs/20.md
@@ -253,7 +253,7 @@ JavaSerializer | 用于序列化将通过网络发送或需要以序列化形式
 *   Mesos 细粒度模式：8
 *   其他：所有执行器节点上的 core 总数或者 2，以较大者为准

- | 如果用户没有指定参数值，则这个属性是 `join`，`reduceByKey`，和 `parallelize` 等转换返回的 RDD 中的默认分区数。 |
+| 如果用户没有指定参数值，则这个属性是 `join`，`reduceByKey`，和 `parallelize` 等转换返回的 RDD 中的默认分区数。 |
 | `spark.executor.heartbeatInterval` | 10s | 每个执行器的心跳与驱动程序之间的间隔。心跳让驱动程序知道执行器仍然存活，并用正在进行的任务的指标更新它 |
 | `spark.files.fetchTimeout` | 60s | 获取文件的通讯超时，所获取的文件是从驱动程序通过 SparkContext.addFile() 添加的。 |
 | `spark.files.useFetchCache` | true | 如果设置为 true（默认），文件提取将使用由属于同一应用程序的执行器共享的本地缓存，这可以提高在同一主机上运行许多执行器时的任务启动性能。如果设置为 false，这些缓存优化将被禁用，所有执行器将获取它们自己的文件副本。如果使用驻留在 NFS 文件系统上的 Spark 本地目录，可以禁用此优化（有关详细信息，请参阅 [SPARK-6313](https://issues.apache.org/jira/browse/SPARK-6313) ）。 |
@@ -321,14 +321,10 @@ It also allows a different address from the local one to be advertised to execut

 | Property Name （属性名称） | Default （默认值） | Meaning （含义） |
 | --- | --- | --- |
-| `spark.dynamicAllocation.enabled` | false | 是否使用动态资源分配，它根据工作负载调整为此应用程序注册的执行程序数量。有关更多详细信息，请参阅 [here](job-scheduling.html#dynamic-resource-allocation) 的说明.
-
-这需要设置 `spark.shuffle.service.enabled`。以下配置也相关：`spark.dynamicAllocation.minExecutors`，`spark.dynamicAllocation.maxExecutors` 和`spark.dynamicAllocation.initialExecutors`。 |
+| `spark.dynamicAllocation.enabled` | false | 是否使用动态资源分配，它根据工作负载调整为此应用程序注册的执行程序数量。有关更多详细信息，请参阅 [here](job-scheduling.html#dynamic-resource-allocation) 的说明。这需要设置 `spark.shuffle.service.enabled`。以下配置也相关：`spark.dynamicAllocation.minExecutors`，`spark.dynamicAllocation.maxExecutors` 和`spark.dynamicAllocation.initialExecutors`。 |
 | `spark.dynamicAllocation.executorIdleTimeout` | 60s | 如果启用动态分配，并且执行程序已空闲超过此持续时间，则将删除执行程序。有关更多详细信息，请参阅此[description](job-scheduling.html#resource-allocation-policy)。 |
 | `spark.dynamicAllocation.cachedExecutorIdleTimeout` | infinity | 如果启用动态分配，并且已缓存数据块的执行程序已空闲超过此持续时间，则将删除执行程序。有关详细信息，请参阅此 [description](job-scheduling.html#resource-allocation-policy)。 |
-| `spark.dynamicAllocation.initialExecutors` | `spark.dynamicAllocation.minExecutors` | 启用动态分配时要运行的执行程序的初始数.
-
-如果 `--num-executors`（或 `spark.executor.instances` ）被设置并大于此值，它将被用作初始执行器数。 |
+| `spark.dynamicAllocation.initialExecutors` | `spark.dynamicAllocation.minExecutors` | 启用动态分配时要运行的执行程序的初始数。如果 `--num-executors`（或 `spark.executor.instances` ）被设置并大于此值，它将被用作初始执行器数。 |
 | `spark.dynamicAllocation.maxExecutors` | infinity | 启用动态分配的执行程序数量的上限。 |
 | `spark.dynamicAllocation.minExecutors` | 0 | 启用动态分配的执行程序数量的下限。 |
 | `spark.dynamicAllocation.schedulerBacklogTimeout` | 1s | 如果启用动态分配，并且有超过此持续时间的挂起任务积压，则将请求新的执行者。有关更多详细信息，请参阅此 [description](job-scheduling.html#resource-allocation-policy)。 |

--- a/docs/6.md
+++ b/docs/6.md
@@ -344,7 +344,7 @@ val conf = new SparkConf().setAppName(appName).setMaster(master)
 val ssc = new StreamingContext(conf, Seconds(1))
 ```

-这个 `appName` 参数是展示在集群 UI 界面上的应用程序的名称。`master` 是一个 [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls)，或者一个特殊的 **“local[*]”** 字符串以使用 local mode（本地模式）来运行。在实践中，当在集群上运行时，你不会想在应用程序中硬编码 `master`，而是 [使用 `spark-submit` 来启动应用程序](submitting-applications.html)，并且接受该参数。然而，对于本地测试和单元测试，你可以传递 “local[*]” 来运行 Spark Streaming 进程（检测本地系统中内核的个数）。请注意，做个内部创建了一个 [SparkContext](api/scala/index.html#org.apache.spark.SparkContext)（所有 Spark 功能的出发点），它可以像 ssc.sparkContext 这样被访问.
+这个 `appName` 参数是展示在集群 UI 界面上的应用程序的名称。`master` 是一个 [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls)，或者一个特殊的 **“local[\*]”** 字符串以使用 local mode（本地模式）来运行。在实践中，当在集群上运行时，你不会想在应用程序中硬编码 `master`，而是 [使用 `spark-submit` 来启动应用程序](submitting-applications.html)，并且接受该参数。然而，对于本地测试和单元测试，你可以传递 “local[\*]” 来运行 Spark Streaming 进程（检测本地系统中内核的个数）。请注意，做个内部创建了一个 [SparkContext](api/scala/index.html#org.apache.spark.SparkContext)（所有 Spark 功能的出发点），它可以像 ssc.sparkContext 这样被访问.

 这个 batch interval（批间隔）必须根据您的应用程序和可用的集群资源的等待时间要求进行设置。更多详情请参阅 [优化指南](#setting-the-right-batch-interval) 部分.

@@ -367,7 +367,7 @@ SparkConf conf = new SparkConf().setAppName(appName).setMaster(master);
 JavaStreamingContext ssc = new JavaStreamingContext(conf, new Duration(1000));
 ```

-The `appName` parameter is a name for your application to show on the cluster UI. `master` is a [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls), or a special **“local[*]”** string to run in local mode. In practice, when running on a cluster, you will not want to hardcode `master` in the program, but rather [launch the application with `spark-submit`](submitting-applications.html) and receive it there. However, for local testing and unit tests, you can pass “local[*]” to run Spark Streaming in-process. Note that this internally creates a [JavaSparkContext](api/java/index.html?org/apache/spark/api/java/JavaSparkContext.html) (starting point of all Spark functionality) which can be accessed as `ssc.sparkContext`.
+The `appName` parameter is a name for your application to show on the cluster UI. `master` is a [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls), or a special **“local[\*]”** string to run in local mode. In practice, when running on a cluster, you will not want to hardcode `master` in the program, but rather [launch the application with `spark-submit`](submitting-applications.html) and receive it there. However, for local testing and unit tests, you can pass “local[\*]” to run Spark Streaming in-process. Note that this internally creates a [JavaSparkContext](api/java/index.html?org/apache/spark/api/java/JavaSparkContext.html) (starting point of all Spark functionality) which can be accessed as `ssc.sparkContext`.

 The batch interval must be set based on the latency requirements of your application and available cluster resources. See the [Performance Tuning](#setting-the-right-batch-interval) section for more details.

@@ -390,7 +390,7 @@ sc = SparkContext(master, appName)
 ssc = StreamingContext(sc, 1)
 ```

-The `appName` parameter is a name for your application to show on the cluster UI. `master` is a [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls), or a special **“local[*]”** string to run in local mode. In practice, when running on a cluster, you will not want to hardcode `master` in the program, but rather [launch the application with `spark-submit`](submitting-applications.html) and receive it there. However, for local testing and unit tests, you can pass “local[*]” to run Spark Streaming in-process (detects the number of cores in the local system).
+The `appName` parameter is a name for your application to show on the cluster UI. `master` is a [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls), or a special **“local[\*]”** string to run in local mode. In practice, when running on a cluster, you will not want to hardcode `master` in the program, but rather [launch the application with `spark-submit`](submitting-applications.html) and receive it there. However, for local testing and unit tests, you can pass “local[\*]” to run Spark Streaming in-process (detects the number of cores in the local system).

 The batch interval must be set based on the latency requirements of your application and available cluster resources. See the [Performance Tuning](#setting-the-right-batch-interval) section for more details.

@@ -597,7 +597,7 @@ The update function will be called for each word, with `newValues` having a sequ

 请注意，使用 `updateStateByKey` 需要配置的 `checkpoint` （检查点）的目录，这里是更详细关于讨论 [checkpointing](#checkpointing) 的部分.

-#### Transform Operation*（转换操作）
+#### Transform Operation（转换操作）

 transform 操作（以及它的变化形式如 `transformWith`）允许在 DStream 运行任何 RDD-to-RDD 函数。它能够被用来应用任何没在 DStream API 中提供的 RDD 操作。例如，连接数据流中的每个批（batch）和另外一个数据集的功能并没有在 DStream API 中提供，然而你可以简单的利用 `transform` 方法做到。这使得有非常强大的可能性。例如，可以通过将输入数据流与预先计算的垃圾邮件信息（也可以使用 Spark 一起生成）进行实时数据清理，然后根据它进行过滤.

@@ -1559,16 +1559,9 @@ streaming systems （流系统）的语义通常是通过系统可以处理每

 | Deployment Scenario （部署场景） | Worker Failure （Worker 故障） | Driver Failure （Driver 故障） |
 | --- | --- | --- |
-| _Spark 1.1 或更早版本,_ 或者
-_Spark 1.2 或者没有 write ahead logs 的更高的版本_ | Buffered data lost with unreliable receivers（unreliable receivers 的缓冲数据丢失）
-Zero data loss with reliable receivers （reliable receivers 的零数据丢失）
-At-least once semantics （至少一次性语义） | Buffered data lost with unreliable receivers （unreliable receivers 的缓冲数据丢失）
-Past data lost with all receivers （所有的 receivers 的过去的数据丢失）
-Undefined semantics （未定义语义） |
-| _Spark 1.2 或者带有 write ahead logs 的更高版本_ | Zero data loss with reliable receivers（reliable receivers 的零数据丢失）
-At-least once semantics （至少一次性语义） | Zero data loss with reliable receivers and files （reliable receivers 和 files 的零数据丢失）
-At-least once semantics （至少一次性语义） |
-|  |  |  |
+| _Spark 1.1 或更早版本，_ 或者 <br> _Spark 1.2 或者没有 write ahead logs 的更高的版本_ | Buffered data lost with unreliable receivers（unreliable receivers 的缓冲数据丢失） <br> Zero data loss with reliable receivers （reliable receivers 的零数据丢失） <br> At-least once semantics （至少一次性语义） | Buffered data lost with unreliable receivers （unreliable receivers 的缓冲数据丢失） <br> Past data lost with all receivers （所有的 receivers 的过去的数据丢失） <br> Undefined semantics （未定义语义） |
+| _Spark 1.2 或者带有 write ahead logs 的更高版本_ | Zero data loss with reliable receivers（reliable receivers 的零数据丢失） <br> At-least once semantics （至少一次性语义） | Zero data loss with reliable receivers and files （reliable receivers 和 files 的零数据丢失） <br> At-least once semantics （至少一次性语义） |
+|   |   |   |

 ### With Kafka Direct API （使用 Kafka Direct API）

@@ -1594,7 +1587,7 @@ Output operations （输出操作）（如 `foreachRDD` ）具有 _at-least once
          }
        } 
        ```
-
+        
 * * *

 * * *