From 8ec47f17be0b20e5204d309f72b0bec9b234a7fb Mon Sep 17 00:00:00 2001 From: Maximilian Michels Date: Wed, 4 May 2016 19:44:57 +0200 Subject: [PATCH] [FLINK-3876] improve documentation of Scala Shell - restructure sections - improve readability --- docs/apis/scala_shell.md | 138 +++++++++++++++++++++++++++++++-------- 1 file changed, 112 insertions(+), 26 deletions(-) diff --git a/docs/apis/scala_shell.md b/docs/apis/scala_shell.md index a815f18aa70..0377f5a4aea 100644 --- a/docs/apis/scala_shell.md +++ b/docs/apis/scala_shell.md @@ -25,10 +25,8 @@ under the License. Flink comes with an integrated interactive Scala Shell. -It can be used in a local setup as well as in a cluster setup. To get started with downloading -Flink and setting up a cluster please refer to -[local setup]({{ site.baseurl }}/setup/local_setup.html) or -[cluster setup]({{ site.baseurl }}/setup/cluster_setup.html) +It can be used in a local setup as well as in a cluster setup. + To use the shell with an integrated Flink cluster just execute: @@ -36,14 +34,9 @@ To use the shell with an integrated Flink cluster just execute: bin/start-scala-shell.sh local ~~~ -in the root directory of your binary Flink directory. - -To use it with a running cluster start the scala shell with the keyword `remote` -and supply the host and port of the JobManager with: +in the root directory of your binary Flink directory. To run the Shell on a +cluster, please see the Setup section below. -~~~bash -bin/start-scala-shell.sh remote -~~~ ## Usage @@ -51,6 +44,8 @@ The shell supports Batch and Streaming. Two different ExecutionEnvironments are automatically prebound after startup. Use "benv" and "senv" to access the Batch and Streaming environment respectively. +### DataSet API + The following example will execute the wordcount program in the Scala shell: ~~~scala @@ -59,7 +54,9 @@ Scala-Flink> val text = benv.fromElements( "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", "Or to take arms against a sea of troubles,") -Scala-Flink> val counts = text.flatMap { _.toLowerCase.split("\\W+") }.map { (_, 1) }.groupBy(0).sum(1) +Scala-Flink> val counts = text + .flatMap { _.toLowerCase.split("\\W+") } + .map { (_, 1) }.groupBy(0).sum(1) Scala-Flink> counts.print() ~~~ @@ -71,7 +68,9 @@ It is possbile to write results to a file. However, in this case you need to cal Scala-Flink> benv.execute("MyProgram") ~~~ -The Batch program above can be executed using the Streaming API through: +### DataStream API + +Similar to the the batch program above, we can execute a streaming program through the DataStream API: ~~~scala Scala-Flink> val textStreaming = senv.fromElements( @@ -79,33 +78,120 @@ Scala-Flink> val textStreaming = senv.fromElements( "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", "Or to take arms against a sea of troubles,") - Scala-Flink> val countsStreaming = textStreaming.flatMap { _.toLowerCase.split("\\W+") }.map { (_, 1) }.keyBy(0).sum(1) - Scala-Flink> countsStreaming.print() - Scala-Flink> senv.execute("Streaming Wordcount") +Scala-Flink> val countsStreaming = textStreaming + .flatMap { _.toLowerCase.split("\\W+") } + .map { (_, 1) }.keyBy(0).sum(1) +Scala-Flink> countsStreaming.print() +Scala-Flink> senv.execute("Streaming Wordcount") ~~~ Note, that in the Streaming case, the print operation does not trigger execution directly. -The Flink Shell comes with command history and autocompletion. +The Flink Shell comes with command history and auto-completion. + + +## Adding external dependencies -## Scala Shell with Flink on YARN +It is possible to add external classpaths to the Scala-shell. These will be sent to the Jobmanager automatically alongside your shell program, when calling execute. -The Scala shell can connect Flink cluster on YARN. To connect deployed Flink cluster on YARN, use following command: +Use the parameter `-a ` or `--addclasspath ` to load additional classes. ~~~bash -bin/start-scala-shell.sh yarn +bin/start-scala-shell.sh [local | remote | yarn] --addclasspath ~~~ -The shell reads the connection information of the deployed Flink cluster from the `.yarn-properties` file, which is created in the configured `yarn.properties-file.location` directory or the temporary directory. If there is no deployed Flink cluster on YARN, the shell prints an error message. -The shell can deploy a Flink cluster to YARN, which is used exclusively by the shell. The number of YARN containers can be controlled by the parameter `-n `. The shell deploys a new Flink cluster on YARN and connects the cluster. You can also specify options for YARN cluster such as memory for JobManager, name of YARN application, etc.. +## Setup -## Adding external dependencies +To get an overview of what options the Scala Shell provides, please use -It is possible to add external classpaths to the Scala-shell. These will be sent to the Jobmanager automatically alongside your shell program, when calling execute. +~~~bash +bin/start-scala-shell.sh --help +~~~ -Use the parameter `-a ` or `--addclasspath ` to load additional classes. +### Local + +To use the shell with an integrated Flink cluster just execute: ~~~bash -bin/start-scala-shell.sh [local | remote | yarn] --addclasspath +bin/start-scala-shell.sh local +~~~ + + +### Remote + +To use it with a running cluster start the scala shell with the keyword `remote` +and supply the host and port of the JobManager with: + +~~~bash +bin/start-scala-shell.sh remote +~~~ + +### Yarn Scala Shell cluster + +The shell can deploy a Flink cluster to YARN, which is used exclusively by the +shell. The number of YARN containers can be controlled by the parameter `-n `. +The shell deploys a new Flink cluster on YARN and connects the +cluster. You can also specify options for YARN cluster such as memory for +JobManager, name of YARN application, etc. + +For example, to start a Yarn cluster for the Scala Shell with two TaskManagers +use the following: + +~~~bash + bin/start-scala-shell.sh yarn -n 2 +~~~ + +For all other options, see the full reference at the bottom. + + +### Yarn Session + +If you have previously deployed a Flink cluster using the Flink Yarn Session, +the Scala shell can connect with it using the following command: + +~~~bash + bin/start-scala-shell.sh yarn +~~~ + + +## Full Reference + +~~~bash +Flink Scala Shell +Usage: start-scala-shell.sh [local|remote|yarn] [options] ... + +Command: local [options] +Starts Flink scala shell with a local Flink cluster + -a | --addclasspath + Specifies additional jars to be used in Flink +Command: remote [options] +Starts Flink scala shell connecting to a remote cluster + + Remote host name as string + + Remote port as integer + + -a | --addclasspath + Specifies additional jars to be used in Flink +Command: yarn [options] +Starts Flink scala shell connecting to a yarn cluster + -n arg | --container arg + Number of YARN container to allocate (= Number of TaskManagers) + -jm arg | --jobManagerMemory arg + Memory for JobManager container [in MB] + -nm | --name + Set a custom name for the application on YARN + -qu | --queue + Specifies YARN queue + -s | --slots + Number of slots per TaskManager + -tm | --taskManagerMemory + Memory per TaskManager container [in MB] + -a | --addclasspath + Specifies additional jars to be used in Flink + --configDir + The configuration directory. + -h | --help + Prints this usage text ~~~ -- GitLab