This page provides instructions on how to run Flink in a _fully distributed fashion_ on a _static_ (but possibly heterogeneous) cluster.
这里给出如何在集群上以完全分布式方式运行Flink程序的说明 .
## Requirements
## 需求
### Software Requirements
### 软件需求
Flink runs on all _UNIX-like environments_, e.g. **Linux**, **Mac OS X**, and **Cygwin** (for Windows) and expects the cluster to consist of **one master node** and **one or more worker nodes**. Before you start to setup the system, make sure you have the following software installed **on each node**:
Go to the [downloads page](http://flink.apache.org/downloads.html) and get the ready-to-run package. Make sure to pick the Flink package **matching your Hadoop version**. If you don’t plan to use Hadoop, pick any version.
After downloading the latest release, copy the archive to your master node and extract it:
Set the `jobmanager.rpc.address` key to point to your master node. You should also define the maximum amount of main memory the JVM is allowed to allocate on each node by setting the `jobmanager.heap.mb` and `taskmanager.heap.mb` keys.
These values are given in MB. If some worker nodes have more main memory which you want to allocate to the Flink system you can overwrite the default value by setting the environment variable `FLINK_TM_HEAP` on those specific nodes.
Finally, you must provide a list of all nodes in your cluster which shall be used as worker nodes. Therefore, similar to the HDFS configuration, edit the file _conf/slaves_ and enter the IP/host name of each worker node. Each worker node will later run a TaskManager.
The following example illustrates the setup with three nodes (with IP addresses from _10.0.0.1_ to _10.0.0.3_ and hostnames _master_, _worker1_, _worker2_) and shows the contents of the configuration files (which need to be accessible at the same path on all machines):
The Flink directory must be available on every worker under the same path. You can use a shared NFS directory, or copy the entire Flink directory to every worker node.
Please see the [configuration page](../config.html) for details and additional configuration options.
The following script starts a JobManager on the local node and connects via SSH to all worker nodes listed in the _slaves_ file to start the TaskManager on each node. Now your Flink system is up and running. The JobManager running on the local node will now accept jobs at the configured RPC port.
这些是对Flink非常重要的配置值。
Assuming that you are on the master node and inside the Flink directory:
The Mesos implementation consists of two components: The Application Master and the Worker. The workers are simple TaskManagers which are parameterized by the environment set up by the application master. The most sophisticated component of the Mesos implementation is the application master. The application master currently hosts the following components:
The scheduler is responsible for registering the framework with Mesos, requesting resources, and launching worker nodes. The scheduler continuously needs to report back to Mesos to ensure the framework is in a healthy state. To verify the health of the cluster, the scheduler monitors the spawned workers and marks them as failed and restarts them if necessary.
Flink’s Mesos scheduler itself is currently not highly available. However, it persists all necessary information about its state (e.g. configuration, list of workers) in Zookeeper. In the presence of a failure, it relies on an external system to bring up a new scheduler. The scheduler will then register with Mesos again and go through the reconciliation phase. In the reconciliation phase, the scheduler receives a list of running workers nodes. It matches these against the recovered information from Zookeeper and makes sure to bring back the cluster in the state before the failure.
The artifact server is responsible for providing resources to the worker nodes. The resources can be anything from the Flink binaries to shared secrets or configuration files. For instance, in non-containerized environments, the artifact server will provide the Flink binaries. What files will be served depends on the configuration overlay used.
The Dispatcher and the web interface provide a central point for monitoring, job submission, and other client interaction with the cluster (see[FLIP-6](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077)).
The startup script provide a way to configure and start the application master. All further configuration is then inherited by the workers nodes. This is achieved using configuration overlays. Configuration overlays provide a way to infer configuration from environment variables and config files which are shipped to the worker nodes.
This section refers to [DC/OS](https://dcos.io) which is a Mesos distribution with a sophisticated application management layer. It comes pre-installed with Marathon, a service to supervise applications and maintain their state in case of failures.
Once you have a DC/OS cluster, you may install Flink through the DC/OS Universe. In the search prompt, just search for Flink. Alternatively, you can use the DC/OS CLI:
安装DC / OS群集后,可以通过DC / OS Universe安装Flink。在搜索提示中,只需搜索Flink。或者,可以使用DC / OS CLI:
```
dcos package install flink
```
Further information can be found in the [DC/OS examples documentation](https://github.com/dcos/examples/tree/master/1.8/flink).
After installation you have to configure the set of master and agent nodes by creating the files `MESOS_HOME/etc/mesos/masters` and `MESOS_HOME/etc/mesos/slaves`. These files contain in each row a single hostname on which the respective component will be started (assuming SSH access to these nodes).
In order to configure the Mesos agents, you have to create `MESOS_HOME/etc/mesos/mesos-agent-env.sh` or use the template found in the same directory. You have to configure
In order to run Java applications with Mesos you have to export `MESOS_NATIVE_JAVA_LIBRARY=MESOS_HOME/lib/libmesos.so` on Linux. Under Mac OS X you have to export `MESOS_NATIVE_JAVA_LIBRARY=MESOS_HOME/lib/libmesos.dylib`.
#### Deploying Mesos
In order to start your mesos cluster, use the deployment script `MESOS_HOME/sbin/mesos-start-cluster.sh`. In order to stop your mesos cluster, use the deployment script `MESOS_HOME/sbin/mesos-stop-cluster.sh`. More information about the deployment scripts can be found [here](http://mesos.apache.org/documentation/latest/deploy-scripts/).
Optionally, you may also [install Marathon](https://mesosphere.github.io/marathon/docs/) which enables you to run Flink in [high availability (HA) mode](#high-availability).
### Pre-installing Flink vs Docker/Mesos containers
### 预安装Flink 与 Docker/Mesos 容器
You may install Flink on all of your Mesos Master and Agent nodes. You can also pull the binaries from the Flink web site during deployment and apply your custom configuration before launching the application master. A more convenient and easier to maintain approach is to use Docker containers to manage the Flink binaries and configuration.
In the `/bin` directory of the Flink distribution, you find two startup scripts which manage the Flink processes in a Mesos cluster:
Flink的 `/bin` 目录下可以找到两个脚本用来管理Mesos集群中的Flink进程
1.`mesos-appmaster.sh`This starts the Mesos application master which will register the Mesos scheduler. It is also responsible for starting up the worker nodes.
2.`mesos-taskmanager.sh`The entry point for the Mesos worker processes. You don’t need to explicitly execute this script. It is automatically launched by the Mesos worker node to bring up a new TaskManager.
In order to run the `mesos-appmaster.sh` script you have to define `mesos.master` in the `flink-conf.yaml` or pass it via `-Dmesos.master=...` to the Java process.
When executing `mesos-appmaster.sh`, it will create a job manager on the machine where you executed the script. In contrast to that, the task managers will be run as Mesos tasks in the Mesos cluster.
It is possible to completely parameterize a Mesos application through Java properties passed to the Mesos application master. This also allows to specify general Flink configuration parameters. For example:
**Note:**If Flink is in [legacy mode](//ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#legacy), you should additionally define the number of task managers that are started by Mesos via [`mesos.initial-tasks`](//ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#mesos-initial-tasks).
You will need to run a service like Marathon or Apache Aurora which takes care of restarting the Flink master process in case of node or process failures. In addition, Zookeeper needs to be configured like described in the [High Availability section of the Flink docs](//ci.apache.org/projects/flink/flink-docs-release-1.7/ops/jobmanager_high_availability.html).
Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. In particular, it should also adjust any configuration parameters for the Flink cluster.
For a list of Mesos specific configuration, refer to the [Mesos section](//ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#mesos) of the configuration documentation.
[Docker](https://www.docker.com)is a popular container runtime. There are Docker images for Apache Flink available on Docker Hub which can be used to deploy a session cluster. The Flink repository also contains tooling to create container images to deploy a job cluster.
Images for each supported combination of Hadoop and Scala are available, and tag aliases are provided for convenience.
镜像支持Hadoop和scala的环境组合,为了方便通tag别名来区分。
Beginning with Flink 1.5, image tags that omit a Hadoop version (e.g. `-hadoop28`) correspond to Hadoop-free releases of Flink that do not include a bundled Hadoop distribution.
**Note:** The Docker images are provided as a community project by individuals on a best-effort basis. They are not official releases by the Apache Flink PMC.
A Flink job cluster is a dedicated cluster which runs a single job. The job is part of the image and, thus, there is no extra job submission needed.
Flink作业集群是运行单个作业的专用集群。这项是镜像内容的一部分,不需要额外的操作。
### Docker images
### Docker 镜像
The Flink job cluster image needs to contain the user code jars of the job for which the cluster is started. Therefore, one needs to build a dedicated container image for every job. The `flink-container` module contains a `build.sh` script which can be used to create such an image. Please see the [instructions](https://github.com/apache/flink/blob/release-1.7/flink-container/docker/README.md) for more details.
Example config files for a [session cluster](https://github.com/docker-flink/examples/blob/master/docker-compose.yml) and a [job cluster](https://github.com/apache/flink/blob/release-1.7/flink-container/docker/docker-compose.yml) are available on GitHub.
When the cluster is running, you can visit the web UI at [http://localhost:8081](http://localhost:8081). You can also use the web UI to submit a job to a session cluster.
Please follow [Kubernetes’ setup guide](https://kubernetes.io/docs/setup/) in order to deploy a Kubernetes cluster. If you want to run Kubernetes locally, we recommend using[MiniKube](https://kubernetes.io/docs/setup/minikube/).
**Note:** If using MiniKube please make sure to execute `minikube ssh 'sudo ip link set docker0 promisc on'` before deploying a Flink cluster. Otherwise Flink components are not able to self reference themselves through a Kubernetes service.
**注意:** 如果使用MiniKube,请确保在部署前执行 `minikube ssh 'sudo ip link set docker0 promisc on'` 命令,窦泽,Flink组件无法通过Kubernetes服务来引用自己
## Flink session cluster on Kubernetes
## Kubernetes上的Flink集群
A Flink session cluster is executed as a long-running Kubernetes Deployment. Note that you can run multiple Flink jobs on a session cluster. Each job needs to be submitted to the cluster after the cluster has been deployed.
You can then access the Flink UI via `kubectl proxy`:
可以通过 `kubectl proxy` 来访问Flink UI:
1.Run `kubectl proxy` in a terminal
2.Navigate to [http://localhost:8001/api/v1/namespaces/default/services/flink-jobmanager:ui/proxy](http://localhost:8001/api/v1/namespaces/default/services/flink-jobmanager:ui/proxy) in your browser
A Flink job cluster is a dedicated cluster which runs a single job. The job is part of the image and, thus, there is no extra job submission needed.
Flink作业集群是运行单个作业的专用集群。是镜像的一部分,所以不需要额外的工作
### Creating the job-specific image
### 创建专用作业的镜像
The Flink job cluster image needs to contain the user code jars of the job for which the cluster is started. Therefore, one needs to build a dedicated container image for every job. Please follow these [instructions](https://github.com/apache/flink/blob/release-1.7/flink-container/docker/README.md) to build the Docker image.
In order to deploy the a job cluster on Kubernetes please follow these[instructions](https://github.com/apache/flink/blob/release-1.7/flink-container/kubernetes/README.md#deploy-flink-job-cluster).
The Deployment definitions use the pre-built image `flink:latest` which can be found [on Docker Hub](https://hub.docker.com/r/_/flink/). The image is built from this [Github repository](https://github.com/docker-flink/docker-flink).