@@ -367,7 +367,7 @@ SparkConf conf = new SparkConf().setAppName(appName).setMaster(master);
JavaStreamingContext ssc = new JavaStreamingContext(conf, new Duration(1000));
```
The `appName` parameter is a name for your application to show on the cluster UI. `master` is a [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls), or a special **“local[*]”** string to run in local mode. In practice, when running on a cluster, you will not want to hardcode `master` in the program, but rather [launch the application with `spark-submit`](submitting-applications.html) and receive it there. However, for local testing and unit tests, you can pass “local[*]” to run Spark Streaming in-process. Note that this internally creates a [JavaSparkContext](api/java/index.html?org/apache/spark/api/java/JavaSparkContext.html)(starting point of all Spark functionality) which can be accessed as `ssc.sparkContext`.
The `appName` parameter is a name for your application to show on the cluster UI. `master` is a [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls), or a special **“local[\*]”** string to run in local mode. In practice, when running on a cluster, you will not want to hardcode `master` in the program, but rather [launch the application with `spark-submit`](submitting-applications.html) and receive it there. However, for local testing and unit tests, you can pass “local[\*]” to run Spark Streaming in-process. Note that this internally creates a [JavaSparkContext](api/java/index.html?org/apache/spark/api/java/JavaSparkContext.html)(starting point of all Spark functionality) which can be accessed as `ssc.sparkContext`.
The batch interval must be set based on the latency requirements of your application and available cluster resources. See the [Performance Tuning](#setting-the-right-batch-interval) section for more details.
The `appName` parameter is a name for your application to show on the cluster UI. `master` is a [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls), or a special **“local[*]”** string to run in local mode. In practice, when running on a cluster, you will not want to hardcode `master` in the program, but rather [launch the application with `spark-submit`](submitting-applications.html) and receive it there. However, for local testing and unit tests, you can pass “local[*]” to run Spark Streaming in-process (detects the number of cores in the local system).
The `appName` parameter is a name for your application to show on the cluster UI. `master` is a [Spark, Mesos or YARN cluster URL](submitting-applications.html#master-urls), or a special **“local[\*]”** string to run in local mode. In practice, when running on a cluster, you will not want to hardcode `master` in the program, but rather [launch the application with `spark-submit`](submitting-applications.html) and receive it there. However, for local testing and unit tests, you can pass “local[\*]” to run Spark Streaming in-process (detects the number of cores in the local system).
The batch interval must be set based on the latency requirements of your application and available cluster resources. See the [Performance Tuning](#setting-the-right-batch-interval) section for more details.
...
...
@@ -597,7 +597,7 @@ The update function will be called for each word, with `newValues` having a sequ
_Spark 1.2 或者没有 write ahead logs 的更高的版本_ | Buffered data lost with unreliable receivers(unreliable receivers 的缓冲数据丢失)
Zero data loss with reliable receivers (reliable receivers 的零数据丢失)
At-least once semantics (至少一次性语义) | Buffered data lost with unreliable receivers (unreliable receivers 的缓冲数据丢失)
Past data lost with all receivers (所有的 receivers 的过去的数据丢失)
Undefined semantics (未定义语义) |
| _Spark 1.2 或者带有 write ahead logs 的更高版本_ | Zero data loss with reliable receivers(reliable receivers 的零数据丢失)
At-least once semantics (至少一次性语义) | Zero data loss with reliable receivers and files (reliable receivers 和 files 的零数据丢失)
At-least once semantics (至少一次性语义) |
| | | |
| _Spark 1.1 或更早版本,_ 或者 <br> _Spark 1.2 或者没有 write ahead logs 的更高的版本_ | Buffered data lost with unreliable receivers(unreliable receivers 的缓冲数据丢失) <br> Zero data loss with reliable receivers (reliable receivers 的零数据丢失) <br> At-least once semantics (至少一次性语义) | Buffered data lost with unreliable receivers (unreliable receivers 的缓冲数据丢失) <br> Past data lost with all receivers (所有的 receivers 的过去的数据丢失) <br> Undefined semantics (未定义语义) |
| _Spark 1.2 或者带有 write ahead logs 的更高版本_ | Zero data loss with reliable receivers(reliable receivers 的零数据丢失) <br> At-least once semantics (至少一次性语义) | Zero data loss with reliable receivers and files (reliable receivers 和 files 的零数据丢失) <br> At-least once semantics (至少一次性语义) |