提交 7af127ea 编写于 作者: F Fabian Hueske

Updated run-example quickstart (commands, screenshots)

This closes #136
上级 6b6dae02
......@@ -5,22 +5,20 @@ title: "Quick Start: Run K-Means Example"
* This will be replaced by the TOC
{:toc}
This guide will demonstrate Flink's features by example. You will see how you can leverage Flink's Iteration-feature to find clusters in a dataset using [K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering).
On the way, you will see the compiler, the status interface and the result of the algorithm.
This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution.
## Setup Flink
Follow the [instructions](setup_quickstart.html) to setup Flink and enter the root directory of your Flink setup.
## Generate Input Data
## Generate Input Data
Flink contains a data generator for K-Means.
~~~bash
# Download Flink
wget {{ site.FLINK_DOWNLOAD_URL_HADOOP_1_STABLE }}
tar xzf flink-*.tgz
cd flink-*
# Assuming you are in the root directory of your Flink setup
mkdir kmeans
cd kmeans
# Run data generator
java -cp ../examples/flink-java-examples-{{ site.FLINK_VERSION_STABLE }}-KMeans.jar org.apache.flink.example.java.clustering.util.KMeansDataGenerator 500 10 0.08
java -cp ../examples/flink-java-examples-*-KMeans.jar org.apache.flink.examples.java.clustering.util.KMeansDataGenerator 500 10 0.08
cp /tmp/points .
cp /tmp/centers .
~~~
......@@ -31,19 +29,18 @@ The generator has the following arguments:
KMeansDataGenerator <numberOfDataPoints> <numberOfClusterCenters> [<relative stddev>] [<centroid range>] [<seed>]
~~~
The _relative standard deviation_ is an interesting tuning parameter: it determines the closeness of the points to the centers.
The _relative standard deviation_ is an interesting tuning parameter. It determines the closeness of the points to randomly generated centers.
The `kmeans/` directory should now contain two files: `centers` and `points`.
The `kmeans/` directory should now contain two files: `centers` and `points`. The `points` file contains the points to cluster and the `centers` file contains initial cluster centers.
## Review Input Data
Use the `plotPoints.py` tool to review the result of the data generator. [Download Python Script](quickstart/plotPoints.py)
## Inspect the Input Data
Use the `plotPoints.py` tool to review the generated data points. [Download Python Script](quickstart/plotPoints.py)
~~~ bash
python plotPoints.py points points input
python plotPoints.py points ./points input
~~~
Note: You might have to install [matplotlib](http://matplotlib.org/) (`python-matplotlib` package on Ubuntu) to use the Python script.
You can review the input data stored in the `input-plot.pdf`, for example with Evince (`evince input-plot.pdf`).
......@@ -55,37 +52,39 @@ The following overview presents the impact of the different standard deviations
|<img src="img/quickstart-example/kmeans003.png" alt="example1" style="width: 275px;"/>|<img src="img/quickstart-example/kmeans008.png" alt="example2" style="width: 275px;"/>|<img src="img/quickstart-example/kmeans015.png" alt="example3" style="width: 275px;"/>|
## Run Clustering
We are using the generated input data to run the clustering using a Flink job.
## Start Flink
Start Flink and the web job submission client on your local machine.
# go to the Flink-root directory
cd flink
# start Flink (use ./bin/start-cluster.sh if you're on a cluster)
./bin/start-local.sh
# Start Flink web client
./bin/start-webclient.sh
~~~ bash
# return to the Flink root directory
cd ..
# start Flink
./bin/start-local.sh
# Start the web client
./bin/start-webclient.sh
~~~
## Review Flink Compiler
The Flink webclient allows to submit Flink programs using a graphical user interface.
## Inspect and Run the K-Means Example Program
The Flink web client allows to submit Flink programs using a graphical user interface.
<div class="row" style="padding-top:15px">
<div class="col-md-6">
<a data-lightbox="compiler" href="img/quickstart-example/run-webclient.png" data-lightbox="example-1"><img class="img-responsive" src="img/quickstart-example/run-webclient.png" /></a>
</div>
<div class="col-md-6">
1. <a href="http://localhost:8080/launch.html">Open webclient on localhost:8080</a> <br>
2. Upload the file.
1. Open web client on <a href="http://localhost:8080/launch.html">localhost:8080</a> <br>
2. Upload the K-Mean job JAR file.
{% highlight bash %}
examples/flink-java-examples-{{ site.FLINK_VERSION_STABLE }}-KMeans.jar
./examples/flink-java-examples-*-KMeans.jar
{% endhighlight %} </br>
3. Select it in the left box to see how the operators in the plan are connected to each other. <br>
4. Enter the arguments in the lower left box:
{% highlight bash %}
file://<pathToGenerated>points file://<pathToGenerated>centers file://<pathToGenerated>result 10
file://<pathToFlink>/kmeans/points file://<pathToFlink>/kmeans/centers file://<pathToFlink>/kmeans/result 10
{% endhighlight %}
For example:
{% highlight bash %}
file:///tmp/flink/kmeans/points file:///tmp/flink/kmeans/centers file:///tmp/flink/kmeans/result 20
file:///tmp/flink/kmeans/points file:///tmp/flink/kmeans/centers file:///tmp/flink/kmeans/result 10
{% endhighlight %}
</div>
</div>
......@@ -96,7 +95,7 @@ The Flink webclient allows to submit Flink programs using a graphical user inter
</div>
<div class="col-md-6">
1. Press the <b>RunJob</b> to see the optimzer plan. <br>
1. Press the <b>RunJob</b> to see the optimizer plan. <br>
2. Inspect the operators and see the properties (input sizes, cost estimation) determined by the optimizer.
</div>
</div>
......@@ -107,18 +106,27 @@ The Flink webclient allows to submit Flink programs using a graphical user inter
</div>
<div class="col-md-6">
1. Press the <b>Continue</b> button to start executing the job. <br>
2. <a href="http://localhost:8080/launch.html">Open Flink's monitoring interface</a> to see the job's progress.<br>
3. Once the job has finished, you can analyize the runtime of the individual operators.
2. <a href="http://localhost:8080/launch.html">Open Flink's monitoring interface</a> to see the job's progress. (Due to the small input data, the job will finish really quick!)<br>
3. Once the job has finished, you can analyze the runtime of the individual operators.
</div>
</div>
## Shutdown Flink
Stop Flink when you are done.
## Analyze the Result
~~~ bash
# stop Flink
./bin/stop-local.sh
# Stop the Flink web client
./bin/stop-webclient.sh
~~~
Use the [Python Script](quickstart/plotPoints.py) again to visualize the result
## Analyze the Result
Use the [Python Script](quickstart/plotPoints.py) again to visualize the result.
~~~bash
python plotPoints.py result result result-pdf
cd kmeans
python plotPoints.py result ./result clusters
~~~
The following three pictures show the results for the sample input above. Play around with the parameters (number of iterations, number of clusters) to see how they affect the result.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册