提交 f7f1ed2f 编写于 作者: S Sebastian Kunert 提交者: uce

Update Quickstarts Java API, Run Example, and Setup Quickstart

This closes #39.
上级 515ad3c3
......@@ -2,72 +2,68 @@
title: "Quickstart: Java API"
---
<p class="lead">Start working on your Stratosphere Java program in a few simple steps.</p>
<section id="requirements">
<div class="page-header"><h2>Requirements</h2></div>
<p class="lead">The only requirements are working <strong>Maven 3.0.4</strong> (or higher) and <strong>Java 6.x</strong> (or higher) installations.</p>
</section>
<section id="create_project">
<div class="page-header"><h2>Create Project</h2></div>
<p class="lead">Use one of the following commands to <strong>create a project</strong>:</p>
<ul class="nav nav-tabs" style="border-bottom: none;">
<li class="active"><a href="#quickstart-script" data-toggle="tab">Run the <strong>quickstart script</strong></a></li>
<li><a href="#maven-archetype" data-toggle="tab">Use <strong>Maven archetypes</strong></a></li>
</ul>
<div class="tab-content">
<div class="tab-pane active" id="quickstart-script">
{% highlight bash %}
$ curl https://raw.githubusercontent.com/stratosphere/stratosphere-quickstart/master/quickstart.sh | bash
{% endhighlight %}
</div>
<div class="tab-pane" id="maven-archetype">
{% highlight bash %}
$ mvn archetype:generate \
-DarchetypeGroupId=eu.stratosphere \
-DarchetypeArtifactId=quickstart-java \
-DarchetypeVersion={{site.current_stable}}
{% endhighlight %}
This allows you to <strong>name your newly created project</strong>. It will interactively ask you for the groupId, artifactId, and package name.
</div>
</div>
</section>
<section id="inspect_project">
<div class="page-header"><h2>Inspect Project</h2></div>
<p class="lead">There will be a <strong>new directory in your working directory</strong>. If you've used the <em>curl</em> approach, the directory is called <code>quickstart</code>. Otherwise, it has the name of your artifactId.</p>
<p class="lead">The sample project is a <strong>Maven project</strong>, which contains two classes. <em>Job</em> is a basic skeleton program and <em>WordCountJob</em> a working example. Please note that the <em>main</em> method of both classes allow you to start Stratosphere in a development/testing mode.</p>
<p class="lead">We recommend to <strong>import this project into your IDE</strong> to develop and test it. If you use Eclipse, the <a href="http://www.eclipse.org/m2e/">m2e plugin</a> allows to <a href="http://books.sonatype.com/m2eclipse-book/reference/creating-sect-importing-projects.html#fig-creating-import">import Maven projects</a>. Some Eclipse bundles include that plugin by default, other require you to install it manually. The IntelliJ IDE also supports Maven projects out of the box.</p>
</section>
<section id="build_project">
<div class="alert alert-danger">A note to Mac OS X users: The default JVM heapsize for Java is too small for Stratosphere. You have to manually increase it. Choose "Run Configurations" -> Arguments and write into the "VM Arguments" box: "-Xmx800m" in Eclipse.</div>
<div class="page-header"><h2>Build Project</h2></div>
<p class="lead">If you want to <strong>build your project</strong>, go to your project directory and issue the <code>mvn clean package</code> command. You will <strong>find a jar</strong> that runs on every Stratosphere cluster in <code>target/stratosphere-project-0.1-SNAPSHOT.jar</code>.</p>
</section>
<section id="next_steps">
<div class="page-header"><h2>Next Steps</h2></div>
<p class="lead"><strong>Write your application!</strong></p>
<p>The quickstart project contains a WordCount implementation, the "Hello World" of Big Data processing systems. The goal of WordCount is to determine the frequencies of words in a text, e.g., how often do the terms "the" or "house" occurs in all Wikipedia texts.</p>
<br>
<b>Sample Input:</b> <br>
{% highlight bash %}
Start working on your Stratosphere Java program in a few simple steps.
# Requirements
The only requirements are working __Maven 3.0.4__ (or higher) and __Java 6.x__ (or higher) installations.
# Create Project
Use one of the following commands to __create a project__:
<ul class="nav nav-tabs" style="border-bottom: none;">
<li class="active"><a href="#quickstart-script" data-toggle="tab">Run the <strong>quickstart script</strong></a></li>
<li><a href="#maven-archetype" data-toggle="tab">Use <strong>Maven archetypes</strong></a></li>
</ul>
<div class="tab-content">
<div class="tab-pane active" id="quickstart-script">
{% highlight bash %}
$ curl https://raw.githubusercontent.com/stratosphere/stratosphere-quickstart/master/quickstart.sh | bash
{% endhighlight %}
</div>
<div class="tab-pane" id="maven-archetype">
{% highlight bash %}
$ mvn archetype:generate \
-DarchetypeGroupId=eu.stratosphere \
-DarchetypeArtifactId=quickstart-java \
-DarchetypeVersion={{site.current_stable}}
{% endhighlight %}
This allows you to <strong>name your newly created project</strong>. It will interactively ask you for the groupId, artifactId, and package name.
</div>
</div>
# Inspect Project
There will be a new directory in your working directory. If you've used the _curl_ approach, the directory is called `quickstart`. Otherwise, it has the name of your artifactId.
The sample project is a __Maven project__, which contains two classes. _Job_ is a basic skeleton program and _WordCountJob_ a working example. Please note that the _main_ method of both classes allow you to start Stratosphere in a development/testing mode.
We recommend to __import this project into your IDE__ to develop and test it. If you use Eclipse, the [m2e plugin](http://www.eclipse.org/m2e/) allows to [import Maven projects](http://books.sonatype.com/m2eclipse-book/reference/creating-sect-importing-projects.html#fig-creating-import). Some Eclipse bundles include that plugin by default, other require you to install it manually. The IntelliJ IDE also supports Maven projects out of the box.
A note to Mac OS X users: The default JVM heapsize for Java is too small for Stratosphere. You have to manually increase it. Choose "Run Configurations" -> Arguments and write into the "VM Arguments" box: "-Xmx800m" in Eclipse.
# Build Project
If you want to __build your project__, go to your project directory and issue the `mvn clean package` command. You will __find a jar__ that runs on every Stratosphere cluster in `target/stratosphere-project-0.1-SNAPSHOT.jar`.
# Next Steps
Write your application!
The quickstart project contains a WordCount implementation, the "Hello World" of Big Data processing systems. The goal of WordCount is to determine the frequencies of words in a text, e.g., how often do the terms "the" or "house" occurs in all Wikipedia texts.
__Sample Input__:
```bash
big data is big
{% endhighlight %}
<b>Sample Output:</b> <br>
{% highlight bash %}
```
__Sample Output__:
```bash
big 2
data 1
is 1
{% endhighlight %}
<p>The following code shows the WordCount implementation from the Quickstart which processes some text lines with two operators (FlatMap and Reduce), and writes the prints the resulting words and counts to std-out.</p>
```
The following code shows the WordCount implementation from the Quickstart which processes some text lines with two operators (FlatMap and Reduce), and writes the prints the resulting words and counts to std-out.
{% highlight java %}
```java
public class WordCount {
public static void main(String[] args) throws Exception {
......@@ -97,11 +93,11 @@ public class WordCount {
env.execute("WordCount Example");
}
}
{% endhighlight %}
```
<p>The operations are defined by specialized classes, here the LineSplitter class.</p>
The operations are defined by specialized classes, here the LineSplitter class.
{% highlight java %}
```java
public class LineSplitter extends FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
......@@ -117,10 +113,7 @@ public class LineSplitter extends FlatMapFunction<String, Tuple2<String, Integer
}
}
}
```
[Check GitHub](https://github.com/apache/incubator-flink/blob/master/stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/wordcount/WordCount.java) for the full example code.
{% endhighlight %}
<p><a href="https://github.com/stratosphere/stratosphere/blob/release-{{site.current_stable}}/stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/wordcount/WordCount.java">Check GitHub</a> for the full example code.</p>
<p class="lead">For a complete overview over our Java API, have a look at the <a href="{{ site.baseurl }}/docs/{{site.current_stable_documentation}}/programming_guides/java.html">Stratosphere Documentation</a> and <a href="{{ site.baseurl }}/docs/{{site.current_stable_documentation}}/programming_guides/examples_java.html">further example programs</a>. If you have any trouble, ask on our <a href="https://groups.google.com/forum/#!forum/stratosphere-dev">Mailing list</a>. We are happy to provide help.</p>
</section>
For a complete overview over our Java API, have a look at the [API Documentation](java_api_guide.html) and [further example programs](java_api_examples.html). If you have any trouble, ask on our [Mailing List](http://mail-archives.apache.org/mod_mbox/incubator-flink-dev/). We are happy to provide help.
......@@ -2,81 +2,61 @@
title: "Quick Start: Run K-Means Example"
---
This guide will Peter demonstrate Stratosphere's features by example. You will see how you can leverage Stratosphere's Iteration-feature to find clusters in a dataset using [K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering).
On the way, you will see the compiler, the status interface and the result of the algorithm.
# Generate Input Data
Stratosphere contains a data generator for K-Means.
# Download Stratosphere
wget {{ site.current_stable_dl }}
tar xzf stratosphere-*.tgz
cd stratosphere-*
mkdir kmeans
cd kmeans
# Run data generator
java -cp ../examples/stratosphere-java-examples-{{ site.current_stable }}-KMeans.jar eu.stratosphere.example.java.clustering.util.KMeansDataGenerator 500 10 0.08
cp /tmp/points .
cp /tmp/centers .
<p class="lead">
This guide will demonstrate Stratosphere's features by example. You will see how you can leverage Stratosphere's Iteration-feature to find clusters in a dataset using <a href="http://en.wikipedia.org/wiki/K-means_clustering">K-Means clustering</a>.
On the way, you will see the compiler, the status interface and the result of the algorithm.
</p>
<section id="data">
<div class="page-header">
<h2>Generate Input Data</h2>
</div>
<p>Stratosphere contains a data generator for K-Means.</p>
{% highlight bash %}
# Download Stratosphere
wget {{ site.current_stable_dl }}
tar xzf stratosphere-*.tgz
cd stratosphere-*
mkdir kmeans
cd kmeans
# run data generator
java -cp ../examples/stratosphere-java-examples-{{ site.current_stable }}-KMeans.jar eu.stratosphere.example.java.clustering.util.KMeansDataGenerator 500 10 0.08
cp /tmp/points .
cp /tmp/centers .
{% endhighlight %}
The generator has the following arguments:
{% highlight bash %}
KMeansDataGenerator <numberOfDataPoints> <numberOfClusterCenters> [<relative stddev>] [<centroid range>] [<seed>]
{% endhighlight %}
The <i>relative standard deviation</i> is an interesting tuning parameter: it determines the closeness of the points to the centers.
<p>The <code>kmeans/</code> directory should now contain two files: <code>centers</code> and <code>points</code>.</p>
KMeansDataGenerator <numberOfDataPoints> <numberOfClusterCenters> [<relative stddev>] [<centroid range>] [<seed>]
The _relative standard deviation_ is an interesting tuning parameter: it determines the closeness of the points to the centers.
The `kmeans/` directory should now contain two files: `centers` and `points`.
<h2>Review Input Data</h2>
Use the <code>plotPoints.py</code> tool to review the result of the data generator. <a href="{{site.baseurl}}/quickstart/example-data/plotPoints.py">Download Python Script</a>
{% highlight bash %}
# Review Input Data
Use the `plotPoints.py` tool to review the result of the data generator. [Download Python Script](quickstart/plotPoints.py)
```bash
python2.7 plotPoints.py points points input
{% endhighlight %}
```
Note: You might have to install <a href="http://matplotlib.org/">matplotlib</a> (<code>python-matplotlib</code> package on Ubuntu) to use the Python script.
Note: You might have to install [matplotlib](http://matplotlib.org/) (`python-matplotlib` package on Ubuntu) to use the Python script.
The following overview presents the impact of the different standard deviations on the input data.
<div class="row" style="padding-top:15px">
<div class="col-md-4">
<div class="text-center" style="font-weight:bold;">relative stddev = 0.03</div>
<a data-lightbox="inputs" href="{{site.baseurl}}/img/quickstart-example/kmeans003.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/kmeans003.png" /></a>
</div>
<div class="col-md-4">
<div class="text-center" style="font-weight:bold;padding-bottom:2px">relative stddev = 0.08</div>
<a data-lightbox="inputs" href="{{site.baseurl}}/img/quickstart-example/kmeans008.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/kmeans008.png" /></a>
</div>
<div class="col-md-4">
<div class="text-center" style="font-weight:bold;">relative stddev = 0.15</div>
<a data-lightbox="inputs" href="{{site.baseurl}}/img/quickstart-example/kmeans015.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/kmeans015.png" /></a>
</div>
</div>
</section>
<section id="run">
<div class="page-header">
<h2>Run Clustering</h2>
</div>
|relative stddev = 0.03|relative stddev = 0.08|relative stddev = 0.15|
|:--------------------:|:--------------------:|:--------------------:|
|<img src="img/quickstart-example/kmeans003.png" alt="example1" style="width: 275px;"/>|<img src="img/quickstart-example/kmeans008.png" alt="example2" style="width: 275px;"/>|<img src="img/quickstart-example/kmeans015.png" alt="example3" style="width: 275px;"/>|
# Run Clustering
We are using the generated input data to run the clustering using a Stratosphere job.
{% highlight bash %}
# go to the Stratosphere-root directory
cd stratosphere
# start Stratosphere (use ./bin/start-cluster.sh if you're on a cluster)
./bin/start-local.sh
# Start Stratosphere web client
./bin/start-webclient.sh
{% endhighlight %}
<h2>Review Stratosphere Compiler</h2>
# go to the Stratosphere-root directory
cd stratosphere
# start Stratosphere (use ./bin/start-cluster.sh if you're on a cluster)
./bin/start-local.sh
# Start Stratosphere web client
./bin/start-webclient.sh
# Review Stratosphere Compiler
The Stratosphere webclient allows to submit Stratosphere programs using a graphical user interface.
<div class="row" style="padding-top:15px">
......@@ -85,19 +65,19 @@ The Stratosphere webclient allows to submit Stratosphere programs using a graphi
</div>
<div class="col-md-6">
1. <a href="http://localhost:8080/launch.html">Open webclient on localhost:8080</a> <br>
2. Upload the
{% highlight bash %}
examples/stratosphere-java-examples-0.5-SNAPSHOT-KMeansIterative.jar
{% endhighlight %} file.<br>
2. Upload the file.
{% highlight bash %}
examples/stratosphere-java-examples-0.5-SNAPSHOT-KMeansIterative.jar
{% endhighlight %} </br>
3. Select it in the left box to see how the operators in the plan are connected to each other. <br>
4. Enter the arguments in the lower left box:
{% highlight bash %}
file://<pathToGenerated>points file://<pathToGenerated>centers file://<pathToGenerated>result 10
{% endhighlight %}
For example:
{% highlight bash %}
file:///tmp/stratosphere/kmeans/points file:///tmp/stratosphere/kmeans/centers file:///tmp/stratosphere/kmeans/result 20
{% endhighlight %}
{% highlight bash %}
file://<pathToGenerated>points file://<pathToGenerated>centers file://<pathToGenerated>result 10
{% endhighlight %}
For example:
{% highlight bash %}
file:///tmp/stratosphere/kmeans/points file:///tmp/stratosphere/kmeans/centers file:///tmp/stratosphere/kmeans/result 20
{% endhighlight %}
</div>
</div>
<hr>
......@@ -122,33 +102,19 @@ file:///tmp/stratosphere/kmeans/points file:///tmp/stratosphere/kmeans/centers f
3. Once the job has finished, you can analyize the runtime of the individual operators.
</div>
</div>
</section>
<section id="result">
<div class="page-header">
<h2>Analyze the Result</h2>
</div>
Use the <a href="{{site.baseurl}}/quickstart/example-data/plotPoints.py">Python Script</a> again to visualize the result
{% highlight bash %}
# Analyze the Result
Use the [Python Script]({{site.baseurl}}/quickstart/plotPoints.py) again to visualize the result
```bash
python2.7 plotPoints.py result result result-pdf
{% endhighlight %}
```
The following three pictures show the results for the sample input above. Play around with the parameters (number of iterations, number of clusters) to see how they affect the result.
<div class="row" style="padding-top:15px">
<div class="col-md-4">
<div class="text-center" style="font-weight:bold;">relative stddev = 0.03</div>
<a data-lightbox="results" href="{{site.baseurl}}/img/quickstart-example/result003.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/result003.png" /></a>
</div>
<div class="col-md-4">
<div class="text-center" style="font-weight:bold;">relative stddev = 0.08</div>
<a data-lightbox="results" href="{{site.baseurl}}/img/quickstart-example/result008.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/result008.png" /></a>
</div>
<div class="col-md-4">
<div class="text-center" style="font-weight:bold;">relative stddev = 0.15</div>
<a data-lightbox="results" href="{{site.baseurl}}/img/quickstart-example/result015.png" data-lightbox="example-1"><img class="img-responsive" src="{{site.baseurl}}/img/quickstart-example/result015.png" /></a>
</div>
</div>
</section>
|relative stddev = 0.03|relative stddev = 0.08|relative stddev = 0.15|
|:--------------------:|:--------------------:|:--------------------:|
|<img src="img/quickstart-example/result003.png" alt="example1" style="width: 275px;"/>|<img src="img/quickstart-example/result008.png" alt="example2" style="width: 275px;"/>|<img src="img/quickstart-example/result015.png" alt="example3" style="width: 275px;"/>|
\ No newline at end of file
......@@ -2,131 +2,105 @@
title: "Quickstart: Setup"
---
<p class="lead">Get Stratosphere up and running in a few simple steps.</p>
<section id="requirements">
<div class="page-header"><h2>Requirements</h2></div>
<p class="lead">Stratosphere runs on all <em>UNIX-like</em> environments: <strong>Linux</strong>, <strong>Mac OS X</strong>, <strong>Cygwin</strong>. The only requirement is to have a working <strong>Java 6.x</strong> (or higher) installation.</p>
</section>
<section id="download">
<div class="page-header"><h2>Download</h2></div>
<p class="lead">Download the ready to run binary package. Choose the Stratosphere distribution that <strong>matches your Hadoop version</strong>. If you are unsure which version to choose or you just want to run locally, pick the package for Hadoop 1.2.</p>
<p>
<ul class="nav nav-tabs">
<li class="active"><a href="#bin-hadoop1" data-toggle="tab">Hadoop 1.2</a></li>
<li><a href="#bin-hadoop2" data-toggle="tab">Hadoop 2 (YARN)</a></li>
</ul>
<div class="tab-content text-center">
<div class="tab-pane active" id="bin-hadoop1">
<a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-1',this.href]);" href="{{site.current_stable_dl}}"><i class="icon-download"> </i> Download Stratosphere for Hadoop 1.2</a>
</div>
<div class="tab-pane" id="bin-hadoop2">
<a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-2',this.href]);" href="{{site.current_stable_dl_yarn}}"><i class="icon-download"> </i> Download Stratosphere for Hadoop 2 (YARN)</a>
</div>
</div>
</p>
</section>
<section id="start">
<div class="page-header"><h2>Start</h2></div>
<p class="lead">You are almost done.</p>
<ol>
<li class="lead"><strong>Go to the download directory</strong>,</li>
<li class="lead"><strong>Unpack the downloaded archive</strong>, and</li>
<li class="lead"><strong>Start Stratosphere</strong>.</li>
</ol>
{% highlight bash %}
Get Stratosphere up and running in a few simple steps.
# Requirements
Stratosphere runs on all __UNIX-like__ environments: __Linux__, __Mac OS X__, __Cygwin__. The only requirement is to have a working __Java 6.x__ (or higher) installation.
# Download
Download the ready to run binary package. Choose the Stratosphere distribution that __matches your Hadoop version__. If you are unsure which version to choose or you just want to run locally, pick the package for Hadoop 1.2.
<ul class="nav nav-tabs">
<li class="active"><a href="#bin-hadoop1" data-toggle="tab">Hadoop 1.2</a></li>
<li><a href="#bin-hadoop2" data-toggle="tab">Hadoop 2 (YARN)</a></li>
</ul>
<div class="tab-content text-center">
<div class="tab-pane active" id="bin-hadoop1">
<a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-1',this.href]);" href="{{site.current_stable_dl}}"><i class="icon-download"> </i> Download Stratosphere for Hadoop 1.2</a>
</div>
<div class="tab-pane" id="bin-hadoop2">
<a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-2',this.href]);" href="{{site.current_stable_dl_yarn}}"><i class="icon-download"> </i> Download Stratosphere for Hadoop 2 (YARN)</a>
</div>
</div>
</p>
# Start
You are almost done.
1. Go to the download directory.
2. Unpack the downloaded archive.
3. Start Stratosphere.
```bash
$ cd ~/Downloads # Go to download directory
$ tar xzf stratosphere-*.tgz # Unpack the downloaded archive
$ cd stratosphere
$ bin/start-local.sh # Start Stratosphere
{% endhighlight %}
```
<p class="lead">Check the <strong>JobManager's web frontend</strong> at <a href="http://localhost:8081">http://localhost:8081</a> and make sure everything is up and running.</p>
</section>
Check the __JobManager's web frontend__ at [http://localhost:8081](http://localhost:8081) and make sure everything is up and running.
<section id="example">
<div class="page-header"><h2>Run Example</h2></div>
<p class="lead">Run the <strong>Word Count example</strong> to see Stratosphere at work.</p>
# Run Example
<ol>
<li class="lead"><strong>Download test data:</strong>
{% highlight bash %}
Run the __Word Count example__ to see Stratosphere at work.
* __Download test data__:
```bash
$ wget -O hamlet.txt http://www.gutenberg.org/cache/epub/1787/pg1787.txt
{% endhighlight %}
You now have a text file called <em>hamlet.txt</em> in your working directory.
</li>
<li class="lead"><strong>Start the example program</strong>:
{% highlight bash %}
```
* You now have a text file called _hamlet.txt_ in your working directory.
* __Start the example program__:
```bash
$ bin/stratosphere run \
--jarfile ./examples/stratosphere-java-examples-{{site.current_stable}}-WordCount.jar \
--arguments file://`pwd`/hamlet.txt file://`pwd`/wordcount-result.txt
{% endhighlight %}
You will find a file called <strong>wordcount-result.txt</strong> in your current directory.
</li>
</ol>
</section>
<section id="cluster">
<div class="page-header"><h2>Cluster Setup</h2></div>
<p class="lead"><strong>Running Stratosphere on a cluster</strong> is as easy as running it locally. Having <strong>passwordless SSH</strong> and <strong>the same directory structure</strong> on all your cluster nodes lets you use our scripts to control everything.</p>
<ol>
<li class="lead">Copy the unpacked <strong>stratosphere</strong> directory from the downloaded archive to the same file system path on each node of your setup.</li>
<li class="lead">Choose a <strong>master node</strong> (JobManager) and set the <code>jobmanager.rpc.address</code> key in <code>conf/stratosphere-conf.yaml</code> to its IP or hostname. Make sure that all nodes in your cluster have the same <code>jobmanager.rpc.address</code> configured.</li>
<li class="lead">Add the IPs or hostnames (one per line) of all <strong>worker nodes</strong> (TaskManager) to the slaves files in <code>conf/slaves</code>.</li>
</ol>
<p class="lead">You can now <strong>start the cluster</strong> at your master node with <code>bin/start-cluster.sh</code>.</p>
<p class="lead">
The following <strong>example</strong> illustrates the setup with three nodes (with IP addresses from <em>10.0.0.1</em> to <em>10.0.0.3</em> and hostnames <em>master</em>, <em>worker1</em>, <em>worker2</em>) and shows the contents of the configuration files, which need to be accessible at the same path on all machines:
</p>
```
* You will find a file called __wordcount-result.txt__ in your current directory.
# Cluster Setup
__Running Stratosphere on a cluster__ is as easy as running it locally. Having __passwordless SSH__ and __the same directory structure__ on all your cluster nodes lets you use our scripts to control everything.
1. Copy the unpacked __stratosphere__ directory from the downloaded archive to the same file system path on each node of your setup.
2. Choose a __master node__ (JobManager) and set the `jobmanager.rpc.address` key in `conf/stratosphere-conf.yaml` to its IP or hostname. Make sure that all nodes in your cluster have the same `jobmanager.rpc.address` configured.
3. Add the IPs or hostnames (one per line) of all __worker nodes__ (TaskManager) to the slaves files in `conf/slaves`.
You can now __start the cluster__ at your master node with `bin/start-cluster.sh`.
The following __example__ illustrates the setup with three nodes (with IP addresses from _10.0.0.1_ to _10.0.0.3_ and hostnames _master_, _worker1_, _worker2_) and shows the contents of the configuration files, which need to be accessible at the same path on all machines:
<div class="row">
<div class="col-md-6 text-center">
<img src="{{ site.baseurl }}/img/quickstart_cluster.png" style="width: 85%">
</div>
<div class="col-md-6">
<div class="row">
<div class="col-md-6 text-center">
<img src="{{ site.baseurl }}/img/quickstart_cluster.png" style="width: 85%">
</div>
<div class="col-md-6">
<div class="row">
<p class="lead text-center">
/path/to/<strong>stratosphere/conf/<br>stratosphere-conf.yaml</strong>
<pre>
jobmanager.rpc.address: 10.0.0.1
</pre>
</p>
</div>
<div class="row" style="margin-top: 1em;">
<p class="lead text-center">
/path/to/<strong>stratosphere/<br>conf/slaves</strong>
<pre>
10.0.0.2
10.0.0.3
</pre>
</p>
</div>
</div>
<p class="lead text-center">
/path/to/<strong>stratosphere/conf/<br>stratosphere-conf.yaml</strong>
<pre>jobmanager.rpc.address: 10.0.0.1</pre>
</p>
</div>
</section>
<section id="yarn">
<div class="page-header"><h2>Stratosphere on YARN</h2></div>
<p class="lead">You can easily deploy Stratosphere on your existing <strong>YARN cluster</strong>.
<ol>
<li class="lead">Download the <strong>Stratosphere YARN package</strong> with the YARN client:
<div class="text-center" style="padding: 1em;">
<a style="padding-left:10px" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-yarn',this.href]);" class="btn btn-info btn-lg" href="{{site.current_stable_uberjar}}"><i class="icon-download"> </i> Stratosphere {{ site.current_stable }} for YARN</a>
</div>
</li>
<li class="lead">Make sure your <strong>HADOOP_HOME</strong> (or <em>YARN_CONF_DIR</em> or <em>HADOOP_CONF_DIR</em>) <strong>environment variable</strong> is set to read your YARN and HDFS configuration.</li>
<li class="lead">Run the <strong>YARN client</strong> with:
<div class="text-center" style="padding:1em;">
<code>./bin/yarn-session.sh</code>
</div>
You can run the client with options <code>-n 10 -tm 8192</code> to allocate 10 TaskManagers with 8GB of memory each.</li>
</ol>
<div class="row" style="margin-top: 1em;">
<p class="lead text-center">
/path/to/<strong>stratosphere/<br>conf/slaves</strong>
<pre>
10.0.0.2
10.0.0.3
</pre>
</p>
</section>
</div>
</div>
</div>
# Stratosphere on YARN
You can easily deploy Stratosphere on your existing __YARN cluster__.
1. Download the __Stratosphere YARN package__ with the YARN client: [Stratosphere for YARN]({{site.current_stable_uberjar}})
2. Make sure your __HADOOP_HOME__ (or _YARN_CONF_DIR_ or _HADOOP_CONF_DIR_) __environment variable__ is set to read your YARN and HDFS configuration.
3. Run the __YARN client__ with: `./bin/yarn-session.sh`. You can run the client with options `-n 10 -tm 8192` to allocate 10 TaskManagers with 8GB of memory each.
<hr />
<p class="lead">For <strong>more detailed instructions</strong>, check out the <a href="{{site.baseurl}}/docs/{{site.current_stable_documentation}}">Documentation</a>.</p>
\ No newline at end of file
For __more detailed instructions__, check out the programming Guides and examples.
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册