提交 8ec47f17 编写于 作者: M Maximilian Michels

[FLINK-3876] improve documentation of Scala Shell

- restructure sections
- improve readability
上级 9d6194fa
......@@ -25,10 +25,8 @@ under the License.
Flink comes with an integrated interactive Scala Shell.
It can be used in a local setup as well as in a cluster setup. To get started with downloading
Flink and setting up a cluster please refer to
[local setup]({{ site.baseurl }}/setup/local_setup.html) or
[cluster setup]({{ site.baseurl }}/setup/cluster_setup.html)
It can be used in a local setup as well as in a cluster setup.
To use the shell with an integrated Flink cluster just execute:
......@@ -36,14 +34,9 @@ To use the shell with an integrated Flink cluster just execute:
bin/start-scala-shell.sh local
~~~
in the root directory of your binary Flink directory.
To use it with a running cluster start the scala shell with the keyword `remote`
and supply the host and port of the JobManager with:
in the root directory of your binary Flink directory. To run the Shell on a
cluster, please see the Setup section below.
~~~bash
bin/start-scala-shell.sh remote <hostname> <portnumber>
~~~
## Usage
......@@ -51,6 +44,8 @@ The shell supports Batch and Streaming.
Two different ExecutionEnvironments are automatically prebound after startup.
Use "benv" and "senv" to access the Batch and Streaming environment respectively.
### DataSet API
The following example will execute the wordcount program in the Scala shell:
~~~scala
......@@ -59,7 +54,9 @@ Scala-Flink> val text = benv.fromElements(
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,")
Scala-Flink> val counts = text.flatMap { _.toLowerCase.split("\\W+") }.map { (_, 1) }.groupBy(0).sum(1)
Scala-Flink> val counts = text
.flatMap { _.toLowerCase.split("\\W+") }
.map { (_, 1) }.groupBy(0).sum(1)
Scala-Flink> counts.print()
~~~
......@@ -71,7 +68,9 @@ It is possbile to write results to a file. However, in this case you need to cal
Scala-Flink> benv.execute("MyProgram")
~~~
The Batch program above can be executed using the Streaming API through:
### DataStream API
Similar to the the batch program above, we can execute a streaming program through the DataStream API:
~~~scala
Scala-Flink> val textStreaming = senv.fromElements(
......@@ -79,33 +78,120 @@ Scala-Flink> val textStreaming = senv.fromElements(
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,")
Scala-Flink> val countsStreaming = textStreaming.flatMap { _.toLowerCase.split("\\W+") }.map { (_, 1) }.keyBy(0).sum(1)
Scala-Flink> countsStreaming.print()
Scala-Flink> senv.execute("Streaming Wordcount")
Scala-Flink> val countsStreaming = textStreaming
.flatMap { _.toLowerCase.split("\\W+") }
.map { (_, 1) }.keyBy(0).sum(1)
Scala-Flink> countsStreaming.print()
Scala-Flink> senv.execute("Streaming Wordcount")
~~~
Note, that in the Streaming case, the print operation does not trigger execution directly.
The Flink Shell comes with command history and autocompletion.
The Flink Shell comes with command history and auto-completion.
## Adding external dependencies
## Scala Shell with Flink on YARN
It is possible to add external classpaths to the Scala-shell. These will be sent to the Jobmanager automatically alongside your shell program, when calling execute.
The Scala shell can connect Flink cluster on YARN. To connect deployed Flink cluster on YARN, use following command:
Use the parameter `-a <path/to/jar.jar>` or `--addclasspath <path/to/jar.jar>` to load additional classes.
~~~bash
bin/start-scala-shell.sh yarn
bin/start-scala-shell.sh [local | remote <host> <port> | yarn] --addclasspath <path/to/jar.jar>
~~~
The shell reads the connection information of the deployed Flink cluster from the `.yarn-properties` file, which is created in the configured `yarn.properties-file.location` directory or the temporary directory. If there is no deployed Flink cluster on YARN, the shell prints an error message.
The shell can deploy a Flink cluster to YARN, which is used exclusively by the shell. The number of YARN containers can be controlled by the parameter `-n <arg>`. The shell deploys a new Flink cluster on YARN and connects the cluster. You can also specify options for YARN cluster such as memory for JobManager, name of YARN application, etc..
## Setup
## Adding external dependencies
To get an overview of what options the Scala Shell provides, please use
It is possible to add external classpaths to the Scala-shell. These will be sent to the Jobmanager automatically alongside your shell program, when calling execute.
~~~bash
bin/start-scala-shell.sh --help
~~~
Use the parameter `-a <path/to/jar.jar>` or `--addclasspath <path/to/jar.jar>` to load additional classes.
### Local
To use the shell with an integrated Flink cluster just execute:
~~~bash
bin/start-scala-shell.sh [local | remote <host> <port> | yarn] --addclasspath <path/to/jar.jar>
bin/start-scala-shell.sh local
~~~
### Remote
To use it with a running cluster start the scala shell with the keyword `remote`
and supply the host and port of the JobManager with:
~~~bash
bin/start-scala-shell.sh remote <hostname> <portnumber>
~~~
### Yarn Scala Shell cluster
The shell can deploy a Flink cluster to YARN, which is used exclusively by the
shell. The number of YARN containers can be controlled by the parameter `-n <arg>`.
The shell deploys a new Flink cluster on YARN and connects the
cluster. You can also specify options for YARN cluster such as memory for
JobManager, name of YARN application, etc.
For example, to start a Yarn cluster for the Scala Shell with two TaskManagers
use the following:
~~~bash
bin/start-scala-shell.sh yarn -n 2
~~~
For all other options, see the full reference at the bottom.
### Yarn Session
If you have previously deployed a Flink cluster using the Flink Yarn Session,
the Scala shell can connect with it using the following command:
~~~bash
bin/start-scala-shell.sh yarn
~~~
## Full Reference
~~~bash
Flink Scala Shell
Usage: start-scala-shell.sh [local|remote|yarn] [options] <args>...
Command: local [options]
Starts Flink scala shell with a local Flink cluster
-a <path/to/jar> | --addclasspath <path/to/jar>
Specifies additional jars to be used in Flink
Command: remote [options] <host> <port>
Starts Flink scala shell connecting to a remote cluster
<host>
Remote host name as string
<port>
Remote port as integer
-a <path/to/jar> | --addclasspath <path/to/jar>
Specifies additional jars to be used in Flink
Command: yarn [options]
Starts Flink scala shell connecting to a yarn cluster
-n arg | --container arg
Number of YARN container to allocate (= Number of TaskManagers)
-jm arg | --jobManagerMemory arg
Memory for JobManager container [in MB]
-nm <value> | --name <value>
Set a custom name for the application on YARN
-qu <arg> | --queue <arg>
Specifies YARN queue
-s <arg> | --slots <arg>
Number of slots per TaskManager
-tm <arg> | --taskManagerMemory <arg>
Memory per TaskManager container [in MB]
-a <path/to/jar> | --addclasspath <path/to/jar>
Specifies additional jars to be used in Flink
--configDir <value>
The configuration directory.
-h | --help
Prints this usage text
~~~
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册