提交 6f1fc7f9 编写于 作者: 取昵称好难啊's avatar 取昵称好难啊

Fix markdown syntax

上级 d666601d
......@@ -253,7 +253,7 @@ JavaSerializer | 用于序列化将通过网络发送或需要以序列化形式
* Mesos 细粒度模式:8
* 其他:所有执行器节点上的 core 总数或者 2,以较大者为准
| 如果用户没有指定参数值,则这个属性是 `join``reduceByKey`,和 `parallelize` 等转换返回的 RDD 中的默认分区数。 |
| 如果用户没有指定参数值,则这个属性是 `join``reduceByKey`,和 `parallelize` 等转换返回的 RDD 中的默认分区数。 |
| `spark.executor.heartbeatInterval` | 10s | 每个执行器的心跳与驱动程序之间的间隔。心跳让驱动程序知道执行器仍然存活,并用正在进行的任务的指标更新它 |
| `spark.files.fetchTimeout` | 60s | 获取文件的通讯超时,所获取的文件是从驱动程序通过 SparkContext.addFile() 添加的。 |
| `spark.files.useFetchCache` | true | 如果设置为 true(默认),文件提取将使用由属于同一应用程序的执行器共享的本地缓存,这可以提高在同一主机上运行许多执行器时的任务启动性能。如果设置为 false,这些缓存优化将被禁用,所有执行器将获取它们自己的文件副本。如果使用驻留在 NFS 文件系统上的 Spark 本地目录,可以禁用此优化(有关详细信息,请参阅 [SPARK-6313](https://issues.apache.org/jira/browse/SPARK-6313) )。 |
......@@ -321,14 +321,10 @@ It also allows a different address from the local one to be advertised to execut
| Property Name (属性名称) | Default (默认值) | Meaning (含义) |
| --- | --- | --- |
| `spark.dynamicAllocation.enabled` | false | 是否使用动态资源分配,它根据工作负载调整为此应用程序注册的执行程序数量。有关更多详细信息,请参阅 [here](job-scheduling.html#dynamic-resource-allocation) 的说明.
这需要设置 `spark.shuffle.service.enabled`。以下配置也相关:`spark.dynamicAllocation.minExecutors``spark.dynamicAllocation.maxExecutors``spark.dynamicAllocation.initialExecutors`。 |
| `spark.dynamicAllocation.enabled` | false | 是否使用动态资源分配,它根据工作负载调整为此应用程序注册的执行程序数量。有关更多详细信息,请参阅 [here](job-scheduling.html#dynamic-resource-allocation) 的说明。这需要设置 `spark.shuffle.service.enabled`。以下配置也相关:`spark.dynamicAllocation.minExecutors``spark.dynamicAllocation.maxExecutors``spark.dynamicAllocation.initialExecutors`。 |
| `spark.dynamicAllocation.executorIdleTimeout` | 60s | 如果启用动态分配,并且执行程序已空闲超过此持续时间,则将删除执行程序。有关更多详细信息,请参阅此[description](job-scheduling.html#resource-allocation-policy)。 |
| `spark.dynamicAllocation.cachedExecutorIdleTimeout` | infinity | 如果启用动态分配,并且已缓存数据块的执行程序已空闲超过此持续时间,则将删除执行程序。有关详细信息,请参阅此 [description](job-scheduling.html#resource-allocation-policy)。 |
| `spark.dynamicAllocation.initialExecutors` | `spark.dynamicAllocation.minExecutors` | 启用动态分配时要运行的执行程序的初始数.
如果 `--num-executors`(或 `spark.executor.instances` )被设置并大于此值,它将被用作初始执行器数。 |
| `spark.dynamicAllocation.initialExecutors` | `spark.dynamicAllocation.minExecutors` | 启用动态分配时要运行的执行程序的初始数。如果 `--num-executors`(或 `spark.executor.instances` )被设置并大于此值,它将被用作初始执行器数。 |
| `spark.dynamicAllocation.maxExecutors` | infinity | 启用动态分配的执行程序数量的上限。 |
| `spark.dynamicAllocation.minExecutors` | 0 | 启用动态分配的执行程序数量的下限。 |
| `spark.dynamicAllocation.schedulerBacklogTimeout` | 1s | 如果启用动态分配,并且有超过此持续时间的挂起任务积压,则将请求新的执行者。有关更多详细信息,请参阅此 [description](job-scheduling.html#resource-allocation-policy)。 |
......
......@@ -344,7 +344,7 @@ val conf = new SparkConf().setAppName(appName).setMaster(master)
val ssc = new StreamingContext(conf, Seconds(1))
```
这个 `appName` 参数是展示在集群 UI 界面上的应用程序的名称。`master` 是一个 [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls),或者一个特殊的 **“local[*]”** 字符串以使用 local mode(本地模式)来运行。在实践中,当在集群上运行时,你不会想在应用程序中硬编码 `master`,而是 [使用 `spark-submit` 来启动应用程序](submitting-applications.html),并且接受该参数。然而,对于本地测试和单元测试,你可以传递 “local[*]” 来运行 Spark Streaming 进程(检测本地系统中内核的个数)。请注意,做个内部创建了一个 [SparkContext](api/scala/index.html#org.apache.spark.SparkContext)(所有 Spark 功能的出发点),它可以像 ssc.sparkContext 这样被访问.
这个 `appName` 参数是展示在集群 UI 界面上的应用程序的名称。`master` 是一个 [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls),或者一个特殊的 **“local[\*]”** 字符串以使用 local mode(本地模式)来运行。在实践中,当在集群上运行时,你不会想在应用程序中硬编码 `master`,而是 [使用 `spark-submit` 来启动应用程序](submitting-applications.html),并且接受该参数。然而,对于本地测试和单元测试,你可以传递 “local[\*]” 来运行 Spark Streaming 进程(检测本地系统中内核的个数)。请注意,做个内部创建了一个 [SparkContext](api/scala/index.html#org.apache.spark.SparkContext)(所有 Spark 功能的出发点),它可以像 ssc.sparkContext 这样被访问.
这个 batch interval(批间隔)必须根据您的应用程序和可用的集群资源的等待时间要求进行设置。更多详情请参阅 [优化指南](#setting-the-right-batch-interval) 部分.
......@@ -367,7 +367,7 @@ SparkConf conf = new SparkConf().setAppName(appName).setMaster(master);
JavaStreamingContext ssc = new JavaStreamingContext(conf, new Duration(1000));
```
The `appName` parameter is a name for your application to show on the cluster UI. `master` is a [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls), or a special **“local[*]”** string to run in local mode. In practice, when running on a cluster, you will not want to hardcode `master` in the program, but rather [launch the application with `spark-submit`](submitting-applications.html) and receive it there. However, for local testing and unit tests, you can pass “local[*]” to run Spark Streaming in-process. Note that this internally creates a [JavaSparkContext](api/java/index.html?org/apache/spark/api/java/JavaSparkContext.html) (starting point of all Spark functionality) which can be accessed as `ssc.sparkContext`.
The `appName` parameter is a name for your application to show on the cluster UI. `master` is a [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls), or a special **“local[\*]”** string to run in local mode. In practice, when running on a cluster, you will not want to hardcode `master` in the program, but rather [launch the application with `spark-submit`](submitting-applications.html) and receive it there. However, for local testing and unit tests, you can pass “local[\*]” to run Spark Streaming in-process. Note that this internally creates a [JavaSparkContext](api/java/index.html?org/apache/spark/api/java/JavaSparkContext.html) (starting point of all Spark functionality) which can be accessed as `ssc.sparkContext`.
The batch interval must be set based on the latency requirements of your application and available cluster resources. See the [Performance Tuning](#setting-the-right-batch-interval) section for more details.
......@@ -390,7 +390,7 @@ sc = SparkContext(master, appName)
ssc = StreamingContext(sc, 1)
```
The `appName` parameter is a name for your application to show on the cluster UI. `master` is a [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls), or a special **“local[*]”** string to run in local mode. In practice, when running on a cluster, you will not want to hardcode `master` in the program, but rather [launch the application with `spark-submit`](submitting-applications.html) and receive it there. However, for local testing and unit tests, you can pass “local[*]” to run Spark Streaming in-process (detects the number of cores in the local system).
The `appName` parameter is a name for your application to show on the cluster UI. `master` is a [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls), or a special **“local[\*]”** string to run in local mode. In practice, when running on a cluster, you will not want to hardcode `master` in the program, but rather [launch the application with `spark-submit`](submitting-applications.html) and receive it there. However, for local testing and unit tests, you can pass “local[\*]” to run Spark Streaming in-process (detects the number of cores in the local system).
The batch interval must be set based on the latency requirements of your application and available cluster resources. See the [Performance Tuning](#setting-the-right-batch-interval) section for more details.
......@@ -597,7 +597,7 @@ The update function will be called for each word, with `newValues` having a sequ
请注意,使用 `updateStateByKey` 需要配置的 `checkpoint` (检查点)的目录,这里是更详细关于讨论 [checkpointing](#checkpointing) 的部分.
#### Transform Operation*(转换操作)
#### Transform Operation(转换操作)
transform 操作(以及它的变化形式如 `transformWith`)允许在 DStream 运行任何 RDD-to-RDD 函数。它能够被用来应用任何没在 DStream API 中提供的 RDD 操作。例如,连接数据流中的每个批(batch)和另外一个数据集的功能并没有在 DStream API 中提供,然而你可以简单的利用 `transform` 方法做到。这使得有非常强大的可能性。例如,可以通过将输入数据流与预先计算的垃圾邮件信息(也可以使用 Spark 一起生成)进行实时数据清理,然后根据它进行过滤.
......@@ -1559,16 +1559,9 @@ streaming systems (流系统)的语义通常是通过系统可以处理每
| Deployment Scenario (部署场景) | Worker Failure (Worker 故障) | Driver Failure (Driver 故障) |
| --- | --- | --- |
| _Spark 1.1 或更早版本,_ 或者
_Spark 1.2 或者没有 write ahead logs 的更高的版本_ | Buffered data lost with unreliable receivers(unreliable receivers 的缓冲数据丢失)
Zero data loss with reliable receivers (reliable receivers 的零数据丢失)
At-least once semantics (至少一次性语义) | Buffered data lost with unreliable receivers (unreliable receivers 的缓冲数据丢失)
Past data lost with all receivers (所有的 receivers 的过去的数据丢失)
Undefined semantics (未定义语义) |
| _Spark 1.2 或者带有 write ahead logs 的更高版本_ | Zero data loss with reliable receivers(reliable receivers 的零数据丢失)
At-least once semantics (至少一次性语义) | Zero data loss with reliable receivers and files (reliable receivers 和 files 的零数据丢失)
At-least once semantics (至少一次性语义) |
| | | |
| _Spark 1.1 或更早版本,_ 或者 <br> _Spark 1.2 或者没有 write ahead logs 的更高的版本_ | Buffered data lost with unreliable receivers(unreliable receivers 的缓冲数据丢失) <br> Zero data loss with reliable receivers (reliable receivers 的零数据丢失) <br> At-least once semantics (至少一次性语义) | Buffered data lost with unreliable receivers (unreliable receivers 的缓冲数据丢失) <br> Past data lost with all receivers (所有的 receivers 的过去的数据丢失) <br> Undefined semantics (未定义语义) |
| _Spark 1.2 或者带有 write ahead logs 的更高版本_ | Zero data loss with reliable receivers(reliable receivers 的零数据丢失) <br> At-least once semantics (至少一次性语义) | Zero data loss with reliable receivers and files (reliable receivers 和 files 的零数据丢失) <br> At-least once semantics (至少一次性语义) |
| | | |
### With Kafka Direct API (使用 Kafka Direct API)
......@@ -1594,7 +1587,7 @@ Output operations (输出操作)(如 `foreachRDD` )具有 _at-least once
}
}
```
* * *
* * *
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册