未验证 提交 390928dd 编写于 作者: I ImPerat0R_ 提交者: GitHub

Merge pull request #62 from zhongjiajie/remove_en_folder

Remove en folder
......@@ -62,8 +62,6 @@
+ 术语使用
+ 代码格式
原文在`http://airflow.apachecn.org/en/{name}.html`,文件名相同。
### 三、提交
+ `fork` Github 项目
......
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Apache Airflow (incubating) Documentation</h1>
<p>From: <a href="https://airflow.apache.org/">https://airflow.apache.org/</a></p>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Project</h1>
<div class="section" id="history">
<h2 class="sigil_not_in_toc">History</h2>
<p>Airflow was started in October 2014 by Maxime Beauchemin at Airbnb.
It was open source from the very first commit and officially brought under
the Airbnb Github and announced in June 2015.</p>
<p>The project joined the Apache Software Foundation&#x2019;s incubation program in March 2016.</p>
</div>
<div class="section" id="committers">
<h2 class="sigil_not_in_toc">Committers</h2>
<ul class="simple">
<li>@mistercrunch (Maxime &#x201C;Max&#x201D; Beauchemin)</li>
<li>@r39132 (Siddharth &#x201C;Sid&#x201D; Anand)</li>
<li>@criccomini (Chris Riccomini)</li>
<li>@bolkedebruin (Bolke de Bruin)</li>
<li>@artwr (Arthur Wiedmer)</li>
<li>@jlowin (Jeremiah Lowin)</li>
<li>@patrickleotardif (Patrick Leo Tardif)</li>
<li>@aoen (Dan Davydov)</li>
<li>@syvineckruyk (Steven Yvinec-Kruyk)</li>
<li>@msumit (Sumit Maheshwari)</li>
<li>@alexvanboxel (Alex Van Boxel)</li>
<li>@saguziel (Alex Guziel)</li>
<li>@joygao (Joy Gao)</li>
<li>@fokko (Fokko Driesprong)</li>
<li>@ash (Ash Berlin-Taylor)</li>
<li>@kaxilnaik (Kaxil Naik)</li>
<li>@feng-tao (Tao Feng)</li>
</ul>
<p>For the full list of contributors, take a look at <a class="reference external" href="https://github.com/apache/incubator-airflow/graphs/contributors">Airflow&#x2019;s Github
Contributor page:</a></p>
</div>
<div class="section" id="resources-links">
<h2 class="sigil_not_in_toc">Resources &amp; links</h2>
<ul class="simple">
<li><a class="reference external" href="http://airflow.apache.org/">Airflow&#x2019;s official documentation</a></li>
<li>Mailing list (send emails to
<code class="docutils literal notranslate"><span class="pre">dev-subscribe@airflow.incubator.apache.org</span></code> and/or
<code class="docutils literal notranslate"><span class="pre">commits-subscribe@airflow.incubator.apache.org</span></code>
to subscribe to each)</li>
<li><a class="reference external" href="https://issues.apache.org/jira/browse/AIRFLOW">Issues on Apache&#x2019;s Jira</a></li>
<li><a class="reference external" href="https://gitter.im/airbnb/airflow">Gitter (chat) Channel</a></li>
<li><a class="reference external" href="https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links">More resources and links to Airflow related content on the Wiki</a></li>
</ul>
</div>
<div class="section" id="roadmap">
<h2 class="sigil_not_in_toc">Roadmap</h2>
<p>Please refer to the Roadmap on <a class="reference external" href="https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Home">the wiki</a></p>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Managing Connections</h1>
<p>Airflow needs to know how to connect to your environment. Information
such as hostname, port, login and passwords to other systems and services is
handled in the <code class="docutils literal notranslate"><span class="pre">Admin-&gt;Connection</span></code> section of the UI. The pipeline code you
will author will reference the &#x2018;conn_id&#x2019; of the Connection objects.</p>
<img alt="https://airflow.apache.org/_images/connections.png" src="../img/b1caba93dd8fce8b3c81bfb0d58cbf95.jpg">
<p>Connections can be created and managed using either the UI or environment
variables.</p>
<p>See the <a class="reference internal" href="../concepts.html#concepts-connections"><span class="std std-ref">Connenctions Concepts</span></a> documentation for
more information.</p>
<div class="section" id="creating-a-connection-with-the-ui">
<h2 class="sigil_not_in_toc">Creating a Connection with the UI</h2>
<p>Open the <code class="docutils literal notranslate"><span class="pre">Admin-&gt;Connection</span></code> section of the UI. Click the <code class="docutils literal notranslate"><span class="pre">Create</span></code> link
to create a new connection.</p>
<img alt="https://airflow.apache.org/_images/connection_create.png" src="../img/635aacab53c55192ad3e31c28e65eb43.jpg">
<ol class="arabic simple">
<li>Fill in the <code class="docutils literal notranslate"><span class="pre">Conn</span> <span class="pre">Id</span></code> field with the desired connection ID. It is
recommended that you use lower-case characters and separate words with
underscores.</li>
<li>Choose the connection type with the <code class="docutils literal notranslate"><span class="pre">Conn</span> <span class="pre">Type</span></code> field.</li>
<li>Fill in the remaining fields. See
<a class="reference internal" href="#manage-connections-connection-types"><span class="std std-ref">Connection Types</span></a> for a description of the fields
belonging to the different connection types.</li>
<li>Click the <code class="docutils literal notranslate"><span class="pre">Save</span></code> button to create the connection.</li>
</ol>
</div>
<div class="section" id="editing-a-connection-with-the-ui">
<h2 class="sigil_not_in_toc">Editing a Connection with the UI</h2>
<p>Open the <code class="docutils literal notranslate"><span class="pre">Admin-&gt;Connection</span></code> section of the UI. Click the pencil icon next
to the connection you wish to edit in the connection list.</p>
<img alt="https://airflow.apache.org/_images/connection_edit.png" src="../img/08e0f3fedf871b535c850d202dda1422.jpg">
<p>Modify the connection properties and click the <code class="docutils literal notranslate"><span class="pre">Save</span></code> button to save your
changes.</p>
</div>
<div class="section" id="creating-a-connection-with-environment-variables">
<h2 class="sigil_not_in_toc">Creating a Connection with Environment Variables</h2>
<p>Connections in Airflow pipelines can be created using environment variables.
The environment variable needs to have a prefix of <code class="docutils literal notranslate"><span class="pre">AIRFLOW_CONN_</span></code> for
Airflow with the value in a URI format to use the connection properly.</p>
<p>When referencing the connection in the Airflow pipeline, the <code class="docutils literal notranslate"><span class="pre">conn_id</span></code>
should be the name of the variable without the prefix. For example, if the
<code class="docutils literal notranslate"><span class="pre">conn_id</span></code> is named <code class="docutils literal notranslate"><span class="pre">postgres_master</span></code> the environment variable should be
named <code class="docutils literal notranslate"><span class="pre">AIRFLOW_CONN_POSTGRES_MASTER</span></code> (note that the environment variable
must be all uppercase). Airflow assumes the value returned from the
environment variable to be in a URI format (e.g.
<code class="docutils literal notranslate"><span class="pre">postgres://user:password@localhost:5432/master</span></code> or
<code class="docutils literal notranslate"><span class="pre">s3://accesskey:secretkey@S3</span></code>).</p>
</div>
<div class="section" id="connection-types">
<span id="manage-connections-connection-types"></span><h2 class="sigil_not_in_toc">Connection Types</h2>
<div class="section" id="google-cloud-platform">
<span id="connection-type-gcp"></span><h3 class="sigil_not_in_toc">Google Cloud Platform</h3>
<p>The Google Cloud Platform connection type enables the <a class="reference internal" href="../integration.html#gcp"><span class="std std-ref">GCP Integrations</span></a>.</p>
<div class="section" id="authenticating-to-gcp">
<h4 class="sigil_not_in_toc">Authenticating to GCP</h4>
<p>There are two ways to connect to GCP using Airflow.</p>
<ol class="arabic simple">
<li>Use <a class="reference external" href="https://google-auth.readthedocs.io/en/latest/reference/google.auth.html#google.auth.default">Application Default Credentials</a>,
such as via the metadata server when running on Google Compute Engine.</li>
<li>Use a <a class="reference external" href="https://cloud.google.com/docs/authentication/#service_accounts">service account</a> key
file (JSON format) on disk.</li>
</ol>
</div>
<div class="section" id="default-connection-ids">
<h4 class="sigil_not_in_toc">Default Connection IDs</h4>
<p>The following connection IDs are used by default.</p>
<pre>bigquery_default</pre>
Used by the <a class="reference internal" href="../integration.html#airflow.contrib.hooks.bigquery_hook.BigQueryHook" title="airflow.contrib.hooks.bigquery_hook.BigQueryHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">BigQueryHook</span></code></a>
hook.
<pre>google_cloud_datastore_default</pre>
Used by the <a class="reference internal" href="../integration.html#airflow.contrib.hooks.datastore_hook.DatastoreHook" title="airflow.contrib.hooks.datastore_hook.DatastoreHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">DatastoreHook</span></code></a>
hook.
<pre>google_cloud_default</pre>
Used by the
<a class="reference internal" href="../code.html#airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook" title="airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">GoogleCloudBaseHook</span></code></a>,
<a class="reference internal" href="../integration.html#airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook" title="airflow.contrib.hooks.gcp_dataflow_hook.DataFlowHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">DataFlowHook</span></code></a>,
<a class="reference internal" href="../code.html#airflow.contrib.hooks.gcp_dataproc_hook.DataProcHook" title="airflow.contrib.hooks.gcp_dataproc_hook.DataProcHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">DataProcHook</span></code></a>,
<a class="reference internal" href="../integration.html#airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook" title="airflow.contrib.hooks.gcp_mlengine_hook.MLEngineHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">MLEngineHook</span></code></a>, and
<a class="reference internal" href="../integration.html#airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook" title="airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">GoogleCloudStorageHook</span></code></a> hooks.
</div>
<div class="section" id="configuring-the-connection">
<h4 class="sigil_not_in_toc">Configuring the Connection</h4>
<pre>Project Id (required)</pre>
The Google Cloud project ID to connect to.
<pre>Keyfile Path</pre>
<p class="first">Path to a <a class="reference external" href="https://cloud.google.com/docs/authentication/#service_accounts">service account</a> key
file (JSON format) on disk.</p>
<p class="last">Not required if using application default credentials.</p>
<pre>Keyfile JSON</pre>
<p class="first">Contents of a <a class="reference external" href="https://cloud.google.com/docs/authentication/#service_accounts">service account</a> key
file (JSON format) on disk. It is recommended to <a class="reference internal" href="secure-connections.html"><span class="doc">Secure your connections</span></a> if using this method to authenticate.</p>
<p class="last">Not required if using application default credentials.</p>
<pre>Scopes (comma separated)</pre>
<p class="first">A list of comma-separated <a class="reference external" href="https://developers.google.com/identity/protocols/googlescopes">Google Cloud scopes</a> to
authenticate with.</p>
<div class="last admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Scopes are ignored when using application default credentials. See
issue <a class="reference external" href="https://issues.apache.org/jira/browse/AIRFLOW-2522">AIRFLOW-2522</a>.</p>
</div>
</div>
</div>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Securing Connections</h1>
<p>By default, Airflow will save the passwords for the connection in plain text
within the metadata database. The <code class="docutils literal notranslate"><span class="pre">crypto</span></code> package is highly recommended
during installation. The <code class="docutils literal notranslate"><span class="pre">crypto</span></code> package does require that your operating
system have libffi-dev installed.</p>
<p>If <code class="docutils literal notranslate"><span class="pre">crypto</span></code> package was not installed initially, you can still enable encryption for
connections by following steps below:</p>
<ol class="arabic simple">
<li>Install crypto package <code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[crypto]</span></code></li>
<li>Generate fernet_key, using this code snippet below. fernet_key must be a base64-encoded 32-byte key.</li>
</ol>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">cryptography.fernet</span> <span class="k">import</span> <span class="n">Fernet</span>
<span class="n">fernet_key</span><span class="o">=</span> <span class="n">Fernet</span><span class="o">.</span><span class="n">generate_key</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="n">fernet_key</span><span class="o">.</span><span class="n">decode</span><span class="p">())</span> <span class="c1"># your fernet_key, keep it in secured place!</span>
</pre>
</div>
</div>
<p>3. Replace <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> fernet_key value with the one from step 2.
Alternatively, you can store your fernet_key in OS environment variable. You
do not need to change <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> in this case as Airflow will use environment
variable over the value in <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># Note the double underscores</span>
EXPORT <span class="nv">AIRFLOW__CORE__FERNET_KEY</span> <span class="o">=</span> your_fernet_key
</pre>
</div>
</div>
<ol class="arabic simple" start="4">
<li>Restart Airflow webserver.</li>
<li>For existing connections (the ones that you had defined before installing <code class="docutils literal notranslate"><span class="pre">airflow[crypto]</span></code> and creating a Fernet key), you need to open each connection in the connection admin UI, re-type the password, and save it.</li>
</ol>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Writing Logs</h1>
<div class="section" id="writing-logs-locally">
<h2 class="sigil_not_in_toc">Writing Logs Locally</h2>
<p>Users can specify a logs folder in <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> using the
<code class="docutils literal notranslate"><span class="pre">base_log_folder</span></code> setting. By default, it is in the <code class="docutils literal notranslate"><span class="pre">AIRFLOW_HOME</span></code>
directory.</p>
<p>In addition, users can supply a remote location for storing logs and log
backups in cloud storage.</p>
<p>In the Airflow Web UI, local logs take precedence over remote logs. If local logs
can not be found or accessed, the remote logs will be displayed. Note that logs
are only sent to remote storage once a task completes (including failure). In other
words, remote logs for running tasks are unavailable. Logs are stored in the log
folder as <code class="docutils literal notranslate"><span class="pre">{dag_id}/{task_id}/{execution_date}/{try_number}.log</span></code>.</p>
</div>
<div class="section" id="writing-logs-to-amazon-s3">
<span id="write-logs-amazon"></span><h2 class="sigil_not_in_toc">Writing Logs to Amazon S3</h2>
<div class="section" id="before-you-begin">
<h3 class="sigil_not_in_toc">Before you begin</h3>
<p>Remote logging uses an existing Airflow connection to read/write logs. If you
don&#x2019;t have a connection properly setup, this will fail.</p>
</div>
<div class="section" id="enabling-remote-logging">
<h3 class="sigil_not_in_toc">Enabling remote logging</h3>
<p>To enable this feature, <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> must be configured as in this
example:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>core<span class="o">]</span>
<span class="c1"># Airflow can store logs remotely in AWS S3. Users must supply a remote</span>
<span class="c1"># location URL (starting with either &apos;s3://...&apos;) and an Airflow connection</span>
<span class="c1"># id that provides access to the storage location.</span>
<span class="nv">remote_base_log_folder</span> <span class="o">=</span> s3://my-bucket/path/to/logs
<span class="nv">remote_log_conn_id</span> <span class="o">=</span> MyS3Conn
<span class="c1"># Use server-side encryption for logs stored in S3</span>
<span class="nv">encrypt_s3_logs</span> <span class="o">=</span> False
</pre>
</div>
</div>
<p>In the above example, Airflow will try to use <code class="docutils literal notranslate"><span class="pre">S3Hook(&apos;MyS3Conn&apos;)</span></code>.</p>
</div>
</div>
<div class="section" id="writing-logs-to-azure-blob-storage">
<span id="write-logs-azure"></span><h2 class="sigil_not_in_toc">Writing Logs to Azure Blob Storage</h2>
<p>Airflow can be configured to read and write task logs in Azure Blob Storage.
Follow the steps below to enable Azure Blob Storage logging.</p>
<ol class="arabic">
<li><p class="first">Airflow&#x2019;s logging system requires a custom .py file to be located in the <code class="docutils literal notranslate"><span class="pre">PYTHONPATH</span></code>, so that it&#x2019;s importable from Airflow. Start by creating a directory to store the config file. <code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME/config</span></code> is recommended.</p>
</li>
<li><p class="first">Create empty files called <code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME/config/log_config.py</span></code> and <code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME/config/__init__.py</span></code>.</p>
</li>
<li><p class="first">Copy the contents of <code class="docutils literal notranslate"><span class="pre">airflow/config_templates/airflow_local_settings.py</span></code> into the <code class="docutils literal notranslate"><span class="pre">log_config.py</span></code> file that was just created in the step above.</p>
</li>
<li><p class="first">Customize the following portions of the template:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># wasb buckets should start with &quot;wasb&quot; just to help Airflow select correct handler</span>
<span class="nv">REMOTE_BASE_LOG_FOLDER</span> <span class="o">=</span> <span class="s1">&apos;wasb-&lt;whatever you want here&gt;&apos;</span>
<span class="c1"># Rename DEFAULT_LOGGING_CONFIG to LOGGING CONFIG</span>
<span class="nv">LOGGING_CONFIG</span> <span class="o">=</span> ...
</pre>
</div>
</div>
</div>
</blockquote>
</li>
<li><p class="first">Make sure a Azure Blob Storage (Wasb) connection hook has been defined in Airflow. The hook should have read and write access to the Azure Blob Storage bucket defined above in <code class="docutils literal notranslate"><span class="pre">REMOTE_BASE_LOG_FOLDER</span></code>.</p>
</li>
<li><p class="first">Update <code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME/airflow.cfg</span></code> to contain:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">remote_logging</span> <span class="o">=</span> True
<span class="nv">logging_config_class</span> <span class="o">=</span> log_config.LOGGING_CONFIG
<span class="nv">remote_log_conn_id</span> <span class="o">=</span> &lt;name of the Azure Blob Storage connection&gt;
</pre>
</div>
</div>
</div>
</blockquote>
</li>
<li><p class="first">Restart the Airflow webserver and scheduler, and trigger (or wait for) a new task execution.</p>
</li>
<li><p class="first">Verify that logs are showing up for newly executed tasks in the bucket you&#x2019;ve defined.</p>
</li>
</ol>
</div>
<div class="section" id="writing-logs-to-google-cloud-storage">
<span id="write-logs-gcp"></span><h2 class="sigil_not_in_toc">Writing Logs to Google Cloud Storage</h2>
<p>Follow the steps below to enable Google Cloud Storage logging.</p>
<ol class="arabic">
<li><p class="first">Airflow&#x2019;s logging system requires a custom .py file to be located in the <code class="docutils literal notranslate"><span class="pre">PYTHONPATH</span></code>, so that it&#x2019;s importable from Airflow. Start by creating a directory to store the config file. <code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME/config</span></code> is recommended.</p>
</li>
<li><p class="first">Create empty files called <code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME/config/log_config.py</span></code> and <code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME/config/__init__.py</span></code>.</p>
</li>
<li><p class="first">Copy the contents of <code class="docutils literal notranslate"><span class="pre">airflow/config_templates/airflow_local_settings.py</span></code> into the <code class="docutils literal notranslate"><span class="pre">log_config.py</span></code> file that was just created in the step above.</p>
</li>
<li><p class="first">Customize the following portions of the template:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># Add this variable to the top of the file. Note the trailing slash.</span>
<span class="nv">GCS_LOG_FOLDER</span> <span class="o">=</span> <span class="s1">&apos;gs://&lt;bucket where logs should be persisted&gt;/&apos;</span>
<span class="c1"># Rename DEFAULT_LOGGING_CONFIG to LOGGING CONFIG</span>
<span class="nv">LOGGING_CONFIG</span> <span class="o">=</span> ...
<span class="c1"># Add a GCSTaskHandler to the &apos;handlers&apos; block of the LOGGING_CONFIG variable</span>
<span class="s1">&apos;gcs.task&apos;</span>: <span class="o">{</span>
<span class="s1">&apos;class&apos;</span>: <span class="s1">&apos;airflow.utils.log.gcs_task_handler.GCSTaskHandler&apos;</span>,
<span class="s1">&apos;formatter&apos;</span>: <span class="s1">&apos;airflow.task&apos;</span>,
<span class="s1">&apos;base_log_folder&apos;</span>: os.path.expanduser<span class="o">(</span>BASE_LOG_FOLDER<span class="o">)</span>,
<span class="s1">&apos;gcs_log_folder&apos;</span>: GCS_LOG_FOLDER,
<span class="s1">&apos;filename_template&apos;</span>: FILENAME_TEMPLATE,
<span class="o">}</span>,
<span class="c1"># Update the airflow.task and airflow.task_runner blocks to be &apos;gcs.task&apos; instead of &apos;file.task&apos;.</span>
<span class="s1">&apos;loggers&apos;</span>: <span class="o">{</span>
<span class="s1">&apos;airflow.task&apos;</span>: <span class="o">{</span>
<span class="s1">&apos;handlers&apos;</span>: <span class="o">[</span><span class="s1">&apos;gcs.task&apos;</span><span class="o">]</span>,
...
<span class="o">}</span>,
<span class="s1">&apos;airflow.task_runner&apos;</span>: <span class="o">{</span>
<span class="s1">&apos;handlers&apos;</span>: <span class="o">[</span><span class="s1">&apos;gcs.task&apos;</span><span class="o">]</span>,
...
<span class="o">}</span>,
<span class="s1">&apos;airflow&apos;</span>: <span class="o">{</span>
<span class="s1">&apos;handlers&apos;</span>: <span class="o">[</span><span class="s1">&apos;console&apos;</span><span class="o">]</span>,
...
<span class="o">}</span>,
<span class="o">}</span>
</pre>
</div>
</div>
</div>
</blockquote>
</li>
<li><p class="first">Make sure a Google Cloud Platform connection hook has been defined in Airflow. The hook should have read and write access to the Google Cloud Storage bucket defined above in <code class="docutils literal notranslate"><span class="pre">GCS_LOG_FOLDER</span></code>.</p>
</li>
<li><p class="first">Update <code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME/airflow.cfg</span></code> to contain:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">task_log_reader</span> <span class="o">=</span> gcs.task
<span class="nv">logging_config_class</span> <span class="o">=</span> log_config.LOGGING_CONFIG
<span class="nv">remote_log_conn_id</span> <span class="o">=</span> &lt;name of the Google cloud platform hook&gt;
</pre>
</div>
</div>
</div>
</blockquote>
</li>
<li><p class="first">Restart the Airflow webserver and scheduler, and trigger (or wait for) a new task execution.</p>
</li>
<li><p class="first">Verify that logs are showing up for newly executed tasks in the bucket you&#x2019;ve defined.</p>
</li>
<li><p class="first">Verify that the Google Cloud Storage viewer is working in the UI. Pull up a newly executed task, and verify that you see something like:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>*** Reading remote log from gs://&lt;bucket where logs should be persisted&gt;/example_bash_operator/run_this_last/2017-10-03T00:00:00/16.log.
<span class="o">[</span><span class="m">2017</span>-10-03 <span class="m">21</span>:57:50,056<span class="o">]</span> <span class="o">{</span>cli.py:377<span class="o">}</span> INFO - Running on host chrisr-00532
<span class="o">[</span><span class="m">2017</span>-10-03 <span class="m">21</span>:57:50,093<span class="o">]</span> <span class="o">{</span>base_task_runner.py:115<span class="o">}</span> INFO - Running: <span class="o">[</span><span class="s1">&apos;bash&apos;</span>, <span class="s1">&apos;-c&apos;</span>, u<span class="s1">&apos;airflow run example_bash_operator run_this_last 2017-10-03T00:00:00 --job_id 47 --raw -sd DAGS_FOLDER/example_dags/example_bash_operator.py&apos;</span><span class="o">]</span>
<span class="o">[</span><span class="m">2017</span>-10-03 <span class="m">21</span>:57:51,264<span class="o">]</span> <span class="o">{</span>base_task_runner.py:98<span class="o">}</span> INFO - Subtask: <span class="o">[</span><span class="m">2017</span>-10-03 <span class="m">21</span>:57:51,263<span class="o">]</span> <span class="o">{</span>__init__.py:45<span class="o">}</span> INFO - Using executor SequentialExecutor
<span class="o">[</span><span class="m">2017</span>-10-03 <span class="m">21</span>:57:51,306<span class="o">]</span> <span class="o">{</span>base_task_runner.py:98<span class="o">}</span> INFO - Subtask: <span class="o">[</span><span class="m">2017</span>-10-03 <span class="m">21</span>:57:51,306<span class="o">]</span> <span class="o">{</span>models.py:186<span class="o">}</span> INFO - Filling up the DagBag from /airflow/dags/example_dags/example_bash_operator.py
</pre>
</div>
</div>
</div>
</blockquote>
</li>
</ol>
<p>Note the top line that says it&#x2019;s reading from the remote log file.</p>
<p>Please be aware that if you were persisting logs to Google Cloud Storage
using the old-style airflow.cfg configuration method, the old logs will no
longer be visible in the Airflow UI, though they&#x2019;ll still exist in Google
Cloud Storage. This is a backwards incompatbile change. If you are unhappy
with it, you can change the <code class="docutils literal notranslate"><span class="pre">FILENAME_TEMPLATE</span></code> to reflect the old-style
log filename format.</p>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Scaling Out with Celery</h1>
<p><code class="docutils literal notranslate"><span class="pre">CeleryExecutor</span></code> is one of the ways you can scale out the number of workers. For this
to work, you need to setup a Celery backend (<strong>RabbitMQ</strong>, <strong>Redis</strong>, &#x2026;) and
change your <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> to point the executor parameter to
<code class="docutils literal notranslate"><span class="pre">CeleryExecutor</span></code> and provide the related Celery settings.</p>
<p>For more information about setting up a Celery broker, refer to the
exhaustive <a class="reference external" href="http://docs.celeryproject.org/en/latest/getting-started/brokers/index.html">Celery documentation on the topic</a>.</p>
<p>Here are a few imperative requirements for your workers:</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">airflow</span></code> needs to be installed, and the CLI needs to be in the path</li>
<li>Airflow configuration settings should be homogeneous across the cluster</li>
<li>Operators that are executed on the worker need to have their dependencies
met in that context. For example, if you use the <code class="docutils literal notranslate"><span class="pre">HiveOperator</span></code>,
the hive CLI needs to be installed on that box, or if you use the
<code class="docutils literal notranslate"><span class="pre">MySqlOperator</span></code>, the required Python library needs to be available in
the <code class="docutils literal notranslate"><span class="pre">PYTHONPATH</span></code> somehow</li>
<li>The worker needs to have access to its <code class="docutils literal notranslate"><span class="pre">DAGS_FOLDER</span></code>, and you need to
synchronize the filesystems by your own means. A common setup would be to
store your DAGS_FOLDER in a Git repository and sync it across machines using
Chef, Puppet, Ansible, or whatever you use to configure machines in your
environment. If all your boxes have a common mount point, having your
pipelines files shared there should work as well</li>
</ul>
<p>To kick off a worker, you need to setup Airflow and kick off the worker
subcommand</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>airflow worker
</pre>
</div>
</div>
<p>Your worker should start picking up tasks as soon as they get fired in
its direction.</p>
<p>Note that you can also run &#x201C;Celery Flower&#x201D;, a web UI built on top of Celery,
to monitor your workers. You can use the shortcut command <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">flower</span></code>
to start a Flower web server.</p>
<p>Some caveats:</p>
<ul class="simple">
<li>Make sure to use a database backed result backend</li>
<li>Make sure to set a visibility timeout in [celery_broker_transport_options] that exceeds the ETA of your longest running task</li>
<li>Tasks can and consume resources, make sure your worker as enough resources to run <cite>worker_concurrency</cite> tasks</li>
</ul>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Scaling Out with Dask</h1>
<p><code class="docutils literal notranslate"><span class="pre">DaskExecutor</span></code> allows you to run Airflow tasks in a Dask Distributed cluster.</p>
<p>Dask clusters can be run on a single machine or on remote networks. For complete
details, consult the <a class="reference external" href="https://distributed.readthedocs.io/">Distributed documentation</a>.</p>
<p>To create a cluster, first start a Scheduler:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># default settings for a local cluster</span>
<span class="nv">DASK_HOST</span><span class="o">=</span><span class="m">127</span>.0.0.1
<span class="nv">DASK_PORT</span><span class="o">=</span><span class="m">8786</span>
dask-scheduler --host <span class="nv">$DASK_HOST</span> --port <span class="nv">$DASK_PORT</span>
</pre>
</div>
</div>
<p>Next start at least one Worker on any machine that can connect to the host:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>dask-worker <span class="nv">$DASK_HOST</span>:<span class="nv">$DASK_PORT</span>
</pre>
</div>
</div>
<p>Edit your <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> to set your executor to <code class="docutils literal notranslate"><span class="pre">DaskExecutor</span></code> and provide
the Dask Scheduler address in the <code class="docutils literal notranslate"><span class="pre">[dask]</span></code> section.</p>
<p>Please note:</p>
<ul class="simple">
<li>Each Dask worker must be able to import Airflow and any dependencies you
require.</li>
<li>Dask does not support queues. If an Airflow task was created with a queue, a
warning will be raised but the task will be submitted to the cluster.</li>
</ul>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Scaling Out with Mesos (community contributed)</h1>
<p>There are two ways you can run airflow as a mesos framework:</p>
<ol class="arabic simple">
<li>Running airflow tasks directly on mesos slaves, requiring each mesos slave to have airflow installed and configured.</li>
<li>Running airflow tasks inside a docker container that has airflow installed, which is run on a mesos slave.</li>
</ol>
<div class="section" id="tasks-executed-directly-on-mesos-slaves">
<h2 class="sigil_not_in_toc">Tasks executed directly on mesos slaves</h2>
<p><code class="docutils literal notranslate"><span class="pre">MesosExecutor</span></code> allows you to schedule airflow tasks on a Mesos cluster.
For this to work, you need a running mesos cluster and you must perform the following
steps -</p>
<ol class="arabic simple">
<li>Install airflow on a mesos slave where web server and scheduler will run,
let&#x2019;s refer to this as the &#x201C;Airflow server&#x201D;.</li>
<li>On the Airflow server, install mesos python eggs from <a class="reference external" href="http://open.mesosphere.com/downloads/mesos/">mesos downloads</a>.</li>
<li>On the Airflow server, use a database (such as mysql) which can be accessed from all mesos
slaves and add configuration in <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>.</li>
<li>Change your <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> to point executor parameter to
<cite>MesosExecutor</cite> and provide related Mesos settings.</li>
<li>On all mesos slaves, install airflow. Copy the <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> from
Airflow server (so that it uses same sql alchemy connection).</li>
<li>On all mesos slaves, run the following for serving logs:</li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>airflow serve_logs
</pre>
</div>
</div>
<ol class="arabic simple" start="7">
<li>On Airflow server, to start processing/scheduling DAGs on mesos, run:</li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>airflow scheduler -p
</pre>
</div>
</div>
<p>Note: We need -p parameter to pickle the DAGs.</p>
<p>You can now see the airflow framework and corresponding tasks in mesos UI.
The logs for airflow tasks can be seen in airflow UI as usual.</p>
<p>For more information about mesos, refer to <a class="reference external" href="http://mesos.apache.org/documentation/latest/">mesos documentation</a>.
For any queries/bugs on <cite>MesosExecutor</cite>, please contact <a class="reference external" href="https://github.com/kapil-malik">@kapil-malik</a>.</p>
</div>
<div class="section" id="tasks-executed-in-containers-on-mesos-slaves">
<h2 class="sigil_not_in_toc">Tasks executed in containers on mesos slaves</h2>
<p><a class="reference external" href="https://gist.github.com/sebradloff/f158874e615bda0005c6f4577b20036e">This gist</a> contains all files and configuration changes necessary to achieve the following:</p>
<ol class="arabic simple">
<li>Create a dockerized version of airflow with mesos python eggs installed.</li>
</ol>
<blockquote>
<div>We recommend taking advantage of docker&#x2019;s multi stage builds in order to achieve this. We have one Dockerfile that defines building a specific version of mesos from source (Dockerfile-mesos), in order to create the python eggs. In the airflow Dockerfile (Dockerfile-airflow) we copy the python eggs from the mesos image.</div>
</blockquote>
<ol class="arabic simple" start="2">
<li>Create a mesos configuration block within the <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>.</li>
</ol>
<blockquote>
<div>The configuration block remains the same as the default airflow configuration (default_airflow.cfg), but has the addition of an option <code class="docutils literal notranslate"><span class="pre">docker_image_slave</span></code>. This should be set to the name of the image you would like mesos to use when running airflow tasks. Make sure you have the proper configuration of the DNS record for your mesos master and any sort of authorization if any exists.</div>
</blockquote>
<ol class="arabic simple" start="3">
<li>Change your <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> to point the executor parameter to
<cite>MesosExecutor</cite> (<cite>executor = SequentialExecutor</cite>).</li>
<li>Make sure your mesos slave has access to the docker repository you are using for your <code class="docutils literal notranslate"><span class="pre">docker_image_slave</span></code>.</li>
</ol>
<blockquote>
<div><a class="reference external" href="https://mesos.readthedocs.io/en/latest/docker-containerizer/#private-docker-repository">Instructions are available in the mesos docs.</a></div>
</blockquote>
<p>The rest is up to you and how you want to work with a dockerized airflow configuration.</p>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Running Airflow with systemd</h1>
<p>Airflow can integrate with systemd based systems. This makes watching your
daemons easy as systemd can take care of restarting a daemon on failure.
In the <code class="docutils literal notranslate"><span class="pre">scripts/systemd</span></code> directory you can find unit files that
have been tested on Redhat based systems. You can copy those to
<code class="docutils literal notranslate"><span class="pre">/usr/lib/systemd/system</span></code>. It is assumed that Airflow will run under
<code class="docutils literal notranslate"><span class="pre">airflow:airflow</span></code>. If not (or if you are running on a non Redhat
based system) you probably need to adjust the unit files.</p>
<p>Environment configuration is picked up from <code class="docutils literal notranslate"><span class="pre">/etc/sysconfig/airflow</span></code>.
An example file is supplied. Make sure to specify the <code class="docutils literal notranslate"><span class="pre">SCHEDULER_RUNS</span></code>
variable in this file when you run the scheduler. You
can also define here, for example, <code class="docutils literal notranslate"><span class="pre">AIRFLOW_HOME</span></code> or <code class="docutils literal notranslate"><span class="pre">AIRFLOW_CONFIG</span></code>.</p>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Running Airflow with upstart</h1>
<p>Airflow can integrate with upstart based systems. Upstart automatically starts all airflow services for which you
have a corresponding <code class="docutils literal notranslate"><span class="pre">*.conf</span></code> file in <code class="docutils literal notranslate"><span class="pre">/etc/init</span></code> upon system boot. On failure, upstart automatically restarts
the process (until it reaches re-spawn limit set in a <code class="docutils literal notranslate"><span class="pre">*.conf</span></code> file).</p>
<p>You can find sample upstart job files in the <code class="docutils literal notranslate"><span class="pre">scripts/upstart</span></code> directory. These files have been tested on
Ubuntu 14.04 LTS. You may have to adjust <code class="docutils literal notranslate"><span class="pre">start</span> <span class="pre">on</span></code> and <code class="docutils literal notranslate"><span class="pre">stop</span> <span class="pre">on</span></code> stanzas to make it work on other upstart
systems. Some of the possible options are listed in <code class="docutils literal notranslate"><span class="pre">scripts/upstart/README</span></code>.</p>
<p>Modify <code class="docutils literal notranslate"><span class="pre">*.conf</span></code> files as needed and copy to <code class="docutils literal notranslate"><span class="pre">/etc/init</span></code> directory. It is assumed that airflow will run
under <code class="docutils literal notranslate"><span class="pre">airflow:airflow</span></code>. Change <code class="docutils literal notranslate"><span class="pre">setuid</span></code> and <code class="docutils literal notranslate"><span class="pre">setgid</span></code> in <code class="docutils literal notranslate"><span class="pre">*.conf</span></code> files if you use other user/group</p>
<p>You can use <code class="docutils literal notranslate"><span class="pre">initctl</span></code> to manually start, stop, view status of the airflow process that has been
integrated with upstart</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>initctl airflow-webserver status
</pre>
</div>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Using the Test Mode Configuration</h1>
<p>Airflow has a fixed set of &#x201C;test mode&#x201D; configuration options. You can load these
at any time by calling <code class="docutils literal notranslate"><span class="pre">airflow.configuration.load_test_config()</span></code> (note this
operation is not reversible!). However, some options (like the DAG_FOLDER) are
loaded before you have a chance to call load_test_config(). In order to eagerly load
the test configuration, set test_mode in airflow.cfg:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>tests<span class="o">]</span>
<span class="nv">unit_test_mode</span> <span class="o">=</span> True
</pre>
</div>
</div>
<p>Due to Airflow&#x2019;s automatic environment variable expansion (see <a class="reference internal" href="set-config.html"><span class="doc">Setting Configuration Options</span></a>),
you can also set the env var <code class="docutils literal notranslate"><span class="pre">AIRFLOW__CORE__UNIT_TEST_MODE</span></code> to temporarily overwrite
airflow.cfg.</p>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>UI / Screenshots</h1>
<p>The Airflow UI make it easy to monitor and troubleshoot your data pipelines.
Here&#x2019;s a quick overview of some of the features and visualizations you
can find in the Airflow UI.</p>
<div class="section" id="dags-view">
<h2 class="sigil_not_in_toc">DAGs View</h2>
<p>List of the DAGs in your environment, and a set of shortcuts to useful pages.
You can see exactly how many tasks succeeded, failed, or are currently
running at a glance.</p>
<hr class="docutils">
<img alt="https://airflow.apache.org/_images/dags.png" src="../img/31a64f6b60a7f97f88c4b557992d0f14.jpg">
</div>
<hr class="docutils">
<div class="section" id="tree-view">
<h2 class="sigil_not_in_toc">Tree View</h2>
<p>A tree representation of the DAG that spans across time. If a pipeline is
late, you can quickly see where the different steps are and identify
the blocking ones.</p>
<hr class="docutils">
<img alt="https://airflow.apache.org/_images/tree.png" src="../img/ad4ba22a6a3d5668fc19e0461f82e192.jpg">
</div>
<hr class="docutils">
<div class="section" id="graph-view">
<h2 class="sigil_not_in_toc">Graph View</h2>
<p>The graph view is perhaps the most comprehensive. Visualize your DAG&#x2019;s
dependencies and their current status for a specific run.</p>
<hr class="docutils">
<img alt="https://airflow.apache.org/_images/graph.png" src="../img/bc05701b0ed4f5347e26c06452e8fd76.jpg">
</div>
<hr class="docutils">
<div class="section" id="variable-view">
<h2 class="sigil_not_in_toc">Variable View</h2>
<p>The variable view allows you to list, create, edit or delete the key-value pair
of a variable used during jobs. Value of a variable will be hidden if the key contains
any words in (&#x2018;password&#x2019;, &#x2018;secret&#x2019;, &#x2018;passwd&#x2019;, &#x2018;authorization&#x2019;, &#x2018;api_key&#x2019;, &#x2018;apikey&#x2019;, &#x2018;access_token&#x2019;)
by default, but can be configured to show in clear-text.</p>
<hr class="docutils">
<img alt="https://airflow.apache.org/_images/variable_hidden.png" src="../img/9bf73cf3f89f830e70f800145ab51b10.jpg">
</div>
<hr class="docutils">
<div class="section" id="gantt-chart">
<h2 class="sigil_not_in_toc">Gantt Chart</h2>
<p>The Gantt chart lets you analyse task duration and overlap. You can quickly
identify bottlenecks and where the bulk of the time is spent for specific
DAG runs.</p>
<hr class="docutils">
<img alt="https://airflow.apache.org/_images/gantt.png" src="../img/cfaa010349b1e40164cabb36c3b7dc1b.jpg">
</div>
<hr class="docutils">
<div class="section" id="task-duration">
<h2 class="sigil_not_in_toc">Task Duration</h2>
<p>The duration of your different tasks over the past N runs. This view lets
you find outliers and quickly understand where the time is spent in your
DAG over many runs.</p>
<hr class="docutils">
<img alt="https://airflow.apache.org/_images/duration.png" src="../img/f0781c3598679db6605d7dfffc65c6a9.jpg">
</div>
<hr class="docutils">
<div class="section" id="code-view">
<h2 class="sigil_not_in_toc">Code View</h2>
<p>Transparency is everything. While the code for your pipeline is in source
control, this is a quick way to get to the code that generates the DAG and
provide yet more context.</p>
<hr class="docutils">
<img alt="https://airflow.apache.org/_images/code.png" src="../img/b732d0bdc5c1a35f3ef34cc2d14b5199.jpg">
</div>
<hr class="docutils">
<div class="section" id="task-instance-context-menu">
<h2 class="sigil_not_in_toc">Task Instance Context Menu</h2>
<p>From the pages seen above (tree view, graph view, gantt, &#x2026;), it is always
possible to click on a task instance, and get to this rich context menu
that can take you to more detailed metadata, and perform some actions.</p>
<hr class="docutils">
<img alt="https://airflow.apache.org/_images/context.png" src="../img/c6288f9767ec25b7660ae86679773f69.jpg">
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>License</h1>
<a class="reference internal image-reference" href="https://airflow.apache.org/_images/apache.jpg"><img alt="https://airflow.apache.org/_images/apache.jpg" src="../img/499e29d5e76bf2bc6edb08291ec11080.jpg" style="width: 150px;"></a>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">Apache</span> <span class="n">License</span>
<span class="n">Version</span> <span class="mf">2.0</span><span class="p">,</span> <span class="n">January</span> <span class="mi">2004</span>
<span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">www</span><span class="o">.</span><span class="n">apache</span><span class="o">.</span><span class="n">org</span><span class="o">/</span><span class="n">licenses</span><span class="o">/</span>
<span class="n">TERMS</span> <span class="n">AND</span> <span class="n">CONDITIONS</span> <span class="n">FOR</span> <span class="n">USE</span><span class="p">,</span> <span class="n">REPRODUCTION</span><span class="p">,</span> <span class="n">AND</span> <span class="n">DISTRIBUTION</span>
<span class="mf">1.</span> <span class="n">Definitions</span><span class="o">.</span>
<span class="s2">&quot;License&quot;</span> <span class="n">shall</span> <span class="n">mean</span> <span class="n">the</span> <span class="n">terms</span> <span class="ow">and</span> <span class="n">conditions</span> <span class="k">for</span> <span class="n">use</span><span class="p">,</span> <span class="n">reproduction</span><span class="p">,</span>
<span class="ow">and</span> <span class="n">distribution</span> <span class="k">as</span> <span class="n">defined</span> <span class="n">by</span> <span class="n">Sections</span> <span class="mi">1</span> <span class="n">through</span> <span class="mi">9</span> <span class="n">of</span> <span class="n">this</span> <span class="n">document</span><span class="o">.</span>
<span class="s2">&quot;Licensor&quot;</span> <span class="n">shall</span> <span class="n">mean</span> <span class="n">the</span> <span class="n">copyright</span> <span class="n">owner</span> <span class="ow">or</span> <span class="n">entity</span> <span class="n">authorized</span> <span class="n">by</span>
<span class="n">the</span> <span class="n">copyright</span> <span class="n">owner</span> <span class="n">that</span> <span class="ow">is</span> <span class="n">granting</span> <span class="n">the</span> <span class="n">License</span><span class="o">.</span>
<span class="s2">&quot;Legal Entity&quot;</span> <span class="n">shall</span> <span class="n">mean</span> <span class="n">the</span> <span class="n">union</span> <span class="n">of</span> <span class="n">the</span> <span class="n">acting</span> <span class="n">entity</span> <span class="ow">and</span> <span class="nb">all</span>
<span class="n">other</span> <span class="n">entities</span> <span class="n">that</span> <span class="n">control</span><span class="p">,</span> <span class="n">are</span> <span class="n">controlled</span> <span class="n">by</span><span class="p">,</span> <span class="ow">or</span> <span class="n">are</span> <span class="n">under</span> <span class="n">common</span>
<span class="n">control</span> <span class="k">with</span> <span class="n">that</span> <span class="n">entity</span><span class="o">.</span> <span class="n">For</span> <span class="n">the</span> <span class="n">purposes</span> <span class="n">of</span> <span class="n">this</span> <span class="n">definition</span><span class="p">,</span>
<span class="s2">&quot;control&quot;</span> <span class="n">means</span> <span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="n">the</span> <span class="n">power</span><span class="p">,</span> <span class="n">direct</span> <span class="ow">or</span> <span class="n">indirect</span><span class="p">,</span> <span class="n">to</span> <span class="n">cause</span> <span class="n">the</span>
<span class="n">direction</span> <span class="ow">or</span> <span class="n">management</span> <span class="n">of</span> <span class="n">such</span> <span class="n">entity</span><span class="p">,</span> <span class="n">whether</span> <span class="n">by</span> <span class="n">contract</span> <span class="ow">or</span>
<span class="n">otherwise</span><span class="p">,</span> <span class="ow">or</span> <span class="p">(</span><span class="n">ii</span><span class="p">)</span> <span class="n">ownership</span> <span class="n">of</span> <span class="n">fifty</span> <span class="n">percent</span> <span class="p">(</span><span class="mi">50</span><span class="o">%</span><span class="p">)</span> <span class="ow">or</span> <span class="n">more</span> <span class="n">of</span> <span class="n">the</span>
<span class="n">outstanding</span> <span class="n">shares</span><span class="p">,</span> <span class="ow">or</span> <span class="p">(</span><span class="n">iii</span><span class="p">)</span> <span class="n">beneficial</span> <span class="n">ownership</span> <span class="n">of</span> <span class="n">such</span> <span class="n">entity</span><span class="o">.</span>
<span class="s2">&quot;You&quot;</span> <span class="p">(</span><span class="ow">or</span> <span class="s2">&quot;Your&quot;</span><span class="p">)</span> <span class="n">shall</span> <span class="n">mean</span> <span class="n">an</span> <span class="n">individual</span> <span class="ow">or</span> <span class="n">Legal</span> <span class="n">Entity</span>
<span class="n">exercising</span> <span class="n">permissions</span> <span class="n">granted</span> <span class="n">by</span> <span class="n">this</span> <span class="n">License</span><span class="o">.</span>
<span class="s2">&quot;Source&quot;</span> <span class="n">form</span> <span class="n">shall</span> <span class="n">mean</span> <span class="n">the</span> <span class="n">preferred</span> <span class="n">form</span> <span class="k">for</span> <span class="n">making</span> <span class="n">modifications</span><span class="p">,</span>
<span class="n">including</span> <span class="n">but</span> <span class="ow">not</span> <span class="n">limited</span> <span class="n">to</span> <span class="n">software</span> <span class="n">source</span> <span class="n">code</span><span class="p">,</span> <span class="n">documentation</span>
<span class="n">source</span><span class="p">,</span> <span class="ow">and</span> <span class="n">configuration</span> <span class="n">files</span><span class="o">.</span>
<span class="s2">&quot;Object&quot;</span> <span class="n">form</span> <span class="n">shall</span> <span class="n">mean</span> <span class="nb">any</span> <span class="n">form</span> <span class="n">resulting</span> <span class="kn">from</span> <span class="nn">mechanical</span>
<span class="n">transformation</span> <span class="ow">or</span> <span class="n">translation</span> <span class="n">of</span> <span class="n">a</span> <span class="n">Source</span> <span class="n">form</span><span class="p">,</span> <span class="n">including</span> <span class="n">but</span>
<span class="ow">not</span> <span class="n">limited</span> <span class="n">to</span> <span class="n">compiled</span> <span class="nb">object</span> <span class="n">code</span><span class="p">,</span> <span class="n">generated</span> <span class="n">documentation</span><span class="p">,</span>
<span class="ow">and</span> <span class="n">conversions</span> <span class="n">to</span> <span class="n">other</span> <span class="n">media</span> <span class="n">types</span><span class="o">.</span>
<span class="s2">&quot;Work&quot;</span> <span class="n">shall</span> <span class="n">mean</span> <span class="n">the</span> <span class="n">work</span> <span class="n">of</span> <span class="n">authorship</span><span class="p">,</span> <span class="n">whether</span> <span class="ow">in</span> <span class="n">Source</span> <span class="ow">or</span>
<span class="n">Object</span> <span class="n">form</span><span class="p">,</span> <span class="n">made</span> <span class="n">available</span> <span class="n">under</span> <span class="n">the</span> <span class="n">License</span><span class="p">,</span> <span class="k">as</span> <span class="n">indicated</span> <span class="n">by</span> <span class="n">a</span>
<span class="n">copyright</span> <span class="n">notice</span> <span class="n">that</span> <span class="ow">is</span> <span class="n">included</span> <span class="ow">in</span> <span class="ow">or</span> <span class="n">attached</span> <span class="n">to</span> <span class="n">the</span> <span class="n">work</span>
<span class="p">(</span><span class="n">an</span> <span class="n">example</span> <span class="ow">is</span> <span class="n">provided</span> <span class="ow">in</span> <span class="n">the</span> <span class="n">Appendix</span> <span class="n">below</span><span class="p">)</span><span class="o">.</span>
<span class="s2">&quot;Derivative Works&quot;</span> <span class="n">shall</span> <span class="n">mean</span> <span class="nb">any</span> <span class="n">work</span><span class="p">,</span> <span class="n">whether</span> <span class="ow">in</span> <span class="n">Source</span> <span class="ow">or</span> <span class="n">Object</span>
<span class="n">form</span><span class="p">,</span> <span class="n">that</span> <span class="ow">is</span> <span class="n">based</span> <span class="n">on</span> <span class="p">(</span><span class="ow">or</span> <span class="n">derived</span> <span class="n">from</span><span class="p">)</span> <span class="n">the</span> <span class="n">Work</span> <span class="ow">and</span> <span class="k">for</span> <span class="n">which</span> <span class="n">the</span>
<span class="n">editorial</span> <span class="n">revisions</span><span class="p">,</span> <span class="n">annotations</span><span class="p">,</span> <span class="n">elaborations</span><span class="p">,</span> <span class="ow">or</span> <span class="n">other</span> <span class="n">modifications</span>
<span class="n">represent</span><span class="p">,</span> <span class="k">as</span> <span class="n">a</span> <span class="n">whole</span><span class="p">,</span> <span class="n">an</span> <span class="n">original</span> <span class="n">work</span> <span class="n">of</span> <span class="n">authorship</span><span class="o">.</span> <span class="n">For</span> <span class="n">the</span> <span class="n">purposes</span>
<span class="n">of</span> <span class="n">this</span> <span class="n">License</span><span class="p">,</span> <span class="n">Derivative</span> <span class="n">Works</span> <span class="n">shall</span> <span class="ow">not</span> <span class="n">include</span> <span class="n">works</span> <span class="n">that</span> <span class="n">remain</span>
<span class="n">separable</span> <span class="n">from</span><span class="p">,</span> <span class="ow">or</span> <span class="n">merely</span> <span class="n">link</span> <span class="p">(</span><span class="ow">or</span> <span class="n">bind</span> <span class="n">by</span> <span class="n">name</span><span class="p">)</span> <span class="n">to</span> <span class="n">the</span> <span class="n">interfaces</span> <span class="n">of</span><span class="p">,</span>
<span class="n">the</span> <span class="n">Work</span> <span class="ow">and</span> <span class="n">Derivative</span> <span class="n">Works</span> <span class="n">thereof</span><span class="o">.</span>
<span class="s2">&quot;Contribution&quot;</span> <span class="n">shall</span> <span class="n">mean</span> <span class="nb">any</span> <span class="n">work</span> <span class="n">of</span> <span class="n">authorship</span><span class="p">,</span> <span class="n">including</span>
<span class="n">the</span> <span class="n">original</span> <span class="n">version</span> <span class="n">of</span> <span class="n">the</span> <span class="n">Work</span> <span class="ow">and</span> <span class="nb">any</span> <span class="n">modifications</span> <span class="ow">or</span> <span class="n">additions</span>
<span class="n">to</span> <span class="n">that</span> <span class="n">Work</span> <span class="ow">or</span> <span class="n">Derivative</span> <span class="n">Works</span> <span class="n">thereof</span><span class="p">,</span> <span class="n">that</span> <span class="ow">is</span> <span class="n">intentionally</span>
<span class="n">submitted</span> <span class="n">to</span> <span class="n">Licensor</span> <span class="k">for</span> <span class="n">inclusion</span> <span class="ow">in</span> <span class="n">the</span> <span class="n">Work</span> <span class="n">by</span> <span class="n">the</span> <span class="n">copyright</span> <span class="n">owner</span>
<span class="ow">or</span> <span class="n">by</span> <span class="n">an</span> <span class="n">individual</span> <span class="ow">or</span> <span class="n">Legal</span> <span class="n">Entity</span> <span class="n">authorized</span> <span class="n">to</span> <span class="n">submit</span> <span class="n">on</span> <span class="n">behalf</span> <span class="n">of</span>
<span class="n">the</span> <span class="n">copyright</span> <span class="n">owner</span><span class="o">.</span> <span class="n">For</span> <span class="n">the</span> <span class="n">purposes</span> <span class="n">of</span> <span class="n">this</span> <span class="n">definition</span><span class="p">,</span> <span class="s2">&quot;submitted&quot;</span>
<span class="n">means</span> <span class="nb">any</span> <span class="n">form</span> <span class="n">of</span> <span class="n">electronic</span><span class="p">,</span> <span class="n">verbal</span><span class="p">,</span> <span class="ow">or</span> <span class="n">written</span> <span class="n">communication</span> <span class="n">sent</span>
<span class="n">to</span> <span class="n">the</span> <span class="n">Licensor</span> <span class="ow">or</span> <span class="n">its</span> <span class="n">representatives</span><span class="p">,</span> <span class="n">including</span> <span class="n">but</span> <span class="ow">not</span> <span class="n">limited</span> <span class="n">to</span>
<span class="n">communication</span> <span class="n">on</span> <span class="n">electronic</span> <span class="n">mailing</span> <span class="n">lists</span><span class="p">,</span> <span class="n">source</span> <span class="n">code</span> <span class="n">control</span> <span class="n">systems</span><span class="p">,</span>
<span class="ow">and</span> <span class="n">issue</span> <span class="n">tracking</span> <span class="n">systems</span> <span class="n">that</span> <span class="n">are</span> <span class="n">managed</span> <span class="n">by</span><span class="p">,</span> <span class="ow">or</span> <span class="n">on</span> <span class="n">behalf</span> <span class="n">of</span><span class="p">,</span> <span class="n">the</span>
<span class="n">Licensor</span> <span class="k">for</span> <span class="n">the</span> <span class="n">purpose</span> <span class="n">of</span> <span class="n">discussing</span> <span class="ow">and</span> <span class="n">improving</span> <span class="n">the</span> <span class="n">Work</span><span class="p">,</span> <span class="n">but</span>
<span class="n">excluding</span> <span class="n">communication</span> <span class="n">that</span> <span class="ow">is</span> <span class="n">conspicuously</span> <span class="n">marked</span> <span class="ow">or</span> <span class="n">otherwise</span>
<span class="n">designated</span> <span class="ow">in</span> <span class="n">writing</span> <span class="n">by</span> <span class="n">the</span> <span class="n">copyright</span> <span class="n">owner</span> <span class="k">as</span> <span class="s2">&quot;Not a Contribution.&quot;</span>
<span class="s2">&quot;Contributor&quot;</span> <span class="n">shall</span> <span class="n">mean</span> <span class="n">Licensor</span> <span class="ow">and</span> <span class="nb">any</span> <span class="n">individual</span> <span class="ow">or</span> <span class="n">Legal</span> <span class="n">Entity</span>
<span class="n">on</span> <span class="n">behalf</span> <span class="n">of</span> <span class="n">whom</span> <span class="n">a</span> <span class="n">Contribution</span> <span class="n">has</span> <span class="n">been</span> <span class="n">received</span> <span class="n">by</span> <span class="n">Licensor</span> <span class="ow">and</span>
<span class="n">subsequently</span> <span class="n">incorporated</span> <span class="n">within</span> <span class="n">the</span> <span class="n">Work</span><span class="o">.</span>
<span class="mf">2.</span> <span class="n">Grant</span> <span class="n">of</span> <span class="n">Copyright</span> <span class="n">License</span><span class="o">.</span> <span class="n">Subject</span> <span class="n">to</span> <span class="n">the</span> <span class="n">terms</span> <span class="ow">and</span> <span class="n">conditions</span> <span class="n">of</span>
<span class="n">this</span> <span class="n">License</span><span class="p">,</span> <span class="n">each</span> <span class="n">Contributor</span> <span class="n">hereby</span> <span class="n">grants</span> <span class="n">to</span> <span class="n">You</span> <span class="n">a</span> <span class="n">perpetual</span><span class="p">,</span>
<span class="n">worldwide</span><span class="p">,</span> <span class="n">non</span><span class="o">-</span><span class="n">exclusive</span><span class="p">,</span> <span class="n">no</span><span class="o">-</span><span class="n">charge</span><span class="p">,</span> <span class="n">royalty</span><span class="o">-</span><span class="n">free</span><span class="p">,</span> <span class="n">irrevocable</span>
<span class="n">copyright</span> <span class="n">license</span> <span class="n">to</span> <span class="n">reproduce</span><span class="p">,</span> <span class="n">prepare</span> <span class="n">Derivative</span> <span class="n">Works</span> <span class="n">of</span><span class="p">,</span>
<span class="n">publicly</span> <span class="n">display</span><span class="p">,</span> <span class="n">publicly</span> <span class="n">perform</span><span class="p">,</span> <span class="n">sublicense</span><span class="p">,</span> <span class="ow">and</span> <span class="n">distribute</span> <span class="n">the</span>
<span class="n">Work</span> <span class="ow">and</span> <span class="n">such</span> <span class="n">Derivative</span> <span class="n">Works</span> <span class="ow">in</span> <span class="n">Source</span> <span class="ow">or</span> <span class="n">Object</span> <span class="n">form</span><span class="o">.</span>
<span class="mf">3.</span> <span class="n">Grant</span> <span class="n">of</span> <span class="n">Patent</span> <span class="n">License</span><span class="o">.</span> <span class="n">Subject</span> <span class="n">to</span> <span class="n">the</span> <span class="n">terms</span> <span class="ow">and</span> <span class="n">conditions</span> <span class="n">of</span>
<span class="n">this</span> <span class="n">License</span><span class="p">,</span> <span class="n">each</span> <span class="n">Contributor</span> <span class="n">hereby</span> <span class="n">grants</span> <span class="n">to</span> <span class="n">You</span> <span class="n">a</span> <span class="n">perpetual</span><span class="p">,</span>
<span class="n">worldwide</span><span class="p">,</span> <span class="n">non</span><span class="o">-</span><span class="n">exclusive</span><span class="p">,</span> <span class="n">no</span><span class="o">-</span><span class="n">charge</span><span class="p">,</span> <span class="n">royalty</span><span class="o">-</span><span class="n">free</span><span class="p">,</span> <span class="n">irrevocable</span>
<span class="p">(</span><span class="k">except</span> <span class="k">as</span> <span class="n">stated</span> <span class="ow">in</span> <span class="n">this</span> <span class="n">section</span><span class="p">)</span> <span class="n">patent</span> <span class="n">license</span> <span class="n">to</span> <span class="n">make</span><span class="p">,</span> <span class="n">have</span> <span class="n">made</span><span class="p">,</span>
<span class="n">use</span><span class="p">,</span> <span class="n">offer</span> <span class="n">to</span> <span class="n">sell</span><span class="p">,</span> <span class="n">sell</span><span class="p">,</span> <span class="n">import</span><span class="p">,</span> <span class="ow">and</span> <span class="n">otherwise</span> <span class="n">transfer</span> <span class="n">the</span> <span class="n">Work</span><span class="p">,</span>
<span class="n">where</span> <span class="n">such</span> <span class="n">license</span> <span class="n">applies</span> <span class="n">only</span> <span class="n">to</span> <span class="n">those</span> <span class="n">patent</span> <span class="n">claims</span> <span class="n">licensable</span>
<span class="n">by</span> <span class="n">such</span> <span class="n">Contributor</span> <span class="n">that</span> <span class="n">are</span> <span class="n">necessarily</span> <span class="n">infringed</span> <span class="n">by</span> <span class="n">their</span>
<span class="n">Contribution</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="n">alone</span> <span class="ow">or</span> <span class="n">by</span> <span class="n">combination</span> <span class="n">of</span> <span class="n">their</span> <span class="n">Contribution</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
<span class="k">with</span> <span class="n">the</span> <span class="n">Work</span> <span class="n">to</span> <span class="n">which</span> <span class="n">such</span> <span class="n">Contribution</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="n">was</span> <span class="n">submitted</span><span class="o">.</span> <span class="n">If</span> <span class="n">You</span>
<span class="n">institute</span> <span class="n">patent</span> <span class="n">litigation</span> <span class="n">against</span> <span class="nb">any</span> <span class="n">entity</span> <span class="p">(</span><span class="n">including</span> <span class="n">a</span>
<span class="n">cross</span><span class="o">-</span><span class="n">claim</span> <span class="ow">or</span> <span class="n">counterclaim</span> <span class="ow">in</span> <span class="n">a</span> <span class="n">lawsuit</span><span class="p">)</span> <span class="n">alleging</span> <span class="n">that</span> <span class="n">the</span> <span class="n">Work</span>
<span class="ow">or</span> <span class="n">a</span> <span class="n">Contribution</span> <span class="n">incorporated</span> <span class="n">within</span> <span class="n">the</span> <span class="n">Work</span> <span class="n">constitutes</span> <span class="n">direct</span>
<span class="ow">or</span> <span class="n">contributory</span> <span class="n">patent</span> <span class="n">infringement</span><span class="p">,</span> <span class="n">then</span> <span class="nb">any</span> <span class="n">patent</span> <span class="n">licenses</span>
<span class="n">granted</span> <span class="n">to</span> <span class="n">You</span> <span class="n">under</span> <span class="n">this</span> <span class="n">License</span> <span class="k">for</span> <span class="n">that</span> <span class="n">Work</span> <span class="n">shall</span> <span class="n">terminate</span>
<span class="k">as</span> <span class="n">of</span> <span class="n">the</span> <span class="n">date</span> <span class="n">such</span> <span class="n">litigation</span> <span class="ow">is</span> <span class="n">filed</span><span class="o">.</span>
<span class="mf">4.</span> <span class="n">Redistribution</span><span class="o">.</span> <span class="n">You</span> <span class="n">may</span> <span class="n">reproduce</span> <span class="ow">and</span> <span class="n">distribute</span> <span class="n">copies</span> <span class="n">of</span> <span class="n">the</span>
<span class="n">Work</span> <span class="ow">or</span> <span class="n">Derivative</span> <span class="n">Works</span> <span class="n">thereof</span> <span class="ow">in</span> <span class="nb">any</span> <span class="n">medium</span><span class="p">,</span> <span class="k">with</span> <span class="ow">or</span> <span class="n">without</span>
<span class="n">modifications</span><span class="p">,</span> <span class="ow">and</span> <span class="ow">in</span> <span class="n">Source</span> <span class="ow">or</span> <span class="n">Object</span> <span class="n">form</span><span class="p">,</span> <span class="n">provided</span> <span class="n">that</span> <span class="n">You</span>
<span class="n">meet</span> <span class="n">the</span> <span class="n">following</span> <span class="n">conditions</span><span class="p">:</span>
<span class="p">(</span><span class="n">a</span><span class="p">)</span> <span class="n">You</span> <span class="n">must</span> <span class="n">give</span> <span class="nb">any</span> <span class="n">other</span> <span class="n">recipients</span> <span class="n">of</span> <span class="n">the</span> <span class="n">Work</span> <span class="ow">or</span>
<span class="n">Derivative</span> <span class="n">Works</span> <span class="n">a</span> <span class="n">copy</span> <span class="n">of</span> <span class="n">this</span> <span class="n">License</span><span class="p">;</span> <span class="ow">and</span>
<span class="p">(</span><span class="n">b</span><span class="p">)</span> <span class="n">You</span> <span class="n">must</span> <span class="n">cause</span> <span class="nb">any</span> <span class="n">modified</span> <span class="n">files</span> <span class="n">to</span> <span class="n">carry</span> <span class="n">prominent</span> <span class="n">notices</span>
<span class="n">stating</span> <span class="n">that</span> <span class="n">You</span> <span class="n">changed</span> <span class="n">the</span> <span class="n">files</span><span class="p">;</span> <span class="ow">and</span>
<span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="n">You</span> <span class="n">must</span> <span class="n">retain</span><span class="p">,</span> <span class="ow">in</span> <span class="n">the</span> <span class="n">Source</span> <span class="n">form</span> <span class="n">of</span> <span class="nb">any</span> <span class="n">Derivative</span> <span class="n">Works</span>
<span class="n">that</span> <span class="n">You</span> <span class="n">distribute</span><span class="p">,</span> <span class="nb">all</span> <span class="n">copyright</span><span class="p">,</span> <span class="n">patent</span><span class="p">,</span> <span class="n">trademark</span><span class="p">,</span> <span class="ow">and</span>
<span class="n">attribution</span> <span class="n">notices</span> <span class="kn">from</span> <span class="nn">the</span> <span class="n">Source</span> <span class="n">form</span> <span class="n">of</span> <span class="n">the</span> <span class="n">Work</span><span class="p">,</span>
<span class="n">excluding</span> <span class="n">those</span> <span class="n">notices</span> <span class="n">that</span> <span class="n">do</span> <span class="ow">not</span> <span class="n">pertain</span> <span class="n">to</span> <span class="nb">any</span> <span class="n">part</span> <span class="n">of</span>
<span class="n">the</span> <span class="n">Derivative</span> <span class="n">Works</span><span class="p">;</span> <span class="ow">and</span>
<span class="p">(</span><span class="n">d</span><span class="p">)</span> <span class="n">If</span> <span class="n">the</span> <span class="n">Work</span> <span class="n">includes</span> <span class="n">a</span> <span class="s2">&quot;NOTICE&quot;</span> <span class="n">text</span> <span class="n">file</span> <span class="k">as</span> <span class="n">part</span> <span class="n">of</span> <span class="n">its</span>
<span class="n">distribution</span><span class="p">,</span> <span class="n">then</span> <span class="nb">any</span> <span class="n">Derivative</span> <span class="n">Works</span> <span class="n">that</span> <span class="n">You</span> <span class="n">distribute</span> <span class="n">must</span>
<span class="n">include</span> <span class="n">a</span> <span class="n">readable</span> <span class="n">copy</span> <span class="n">of</span> <span class="n">the</span> <span class="n">attribution</span> <span class="n">notices</span> <span class="n">contained</span>
<span class="n">within</span> <span class="n">such</span> <span class="n">NOTICE</span> <span class="n">file</span><span class="p">,</span> <span class="n">excluding</span> <span class="n">those</span> <span class="n">notices</span> <span class="n">that</span> <span class="n">do</span> <span class="ow">not</span>
<span class="n">pertain</span> <span class="n">to</span> <span class="nb">any</span> <span class="n">part</span> <span class="n">of</span> <span class="n">the</span> <span class="n">Derivative</span> <span class="n">Works</span><span class="p">,</span> <span class="ow">in</span> <span class="n">at</span> <span class="n">least</span> <span class="n">one</span>
<span class="n">of</span> <span class="n">the</span> <span class="n">following</span> <span class="n">places</span><span class="p">:</span> <span class="n">within</span> <span class="n">a</span> <span class="n">NOTICE</span> <span class="n">text</span> <span class="n">file</span> <span class="n">distributed</span>
<span class="k">as</span> <span class="n">part</span> <span class="n">of</span> <span class="n">the</span> <span class="n">Derivative</span> <span class="n">Works</span><span class="p">;</span> <span class="n">within</span> <span class="n">the</span> <span class="n">Source</span> <span class="n">form</span> <span class="ow">or</span>
<span class="n">documentation</span><span class="p">,</span> <span class="k">if</span> <span class="n">provided</span> <span class="n">along</span> <span class="k">with</span> <span class="n">the</span> <span class="n">Derivative</span> <span class="n">Works</span><span class="p">;</span> <span class="ow">or</span><span class="p">,</span>
<span class="n">within</span> <span class="n">a</span> <span class="n">display</span> <span class="n">generated</span> <span class="n">by</span> <span class="n">the</span> <span class="n">Derivative</span> <span class="n">Works</span><span class="p">,</span> <span class="k">if</span> <span class="ow">and</span>
<span class="n">wherever</span> <span class="n">such</span> <span class="n">third</span><span class="o">-</span><span class="n">party</span> <span class="n">notices</span> <span class="n">normally</span> <span class="n">appear</span><span class="o">.</span> <span class="n">The</span> <span class="n">contents</span>
<span class="n">of</span> <span class="n">the</span> <span class="n">NOTICE</span> <span class="n">file</span> <span class="n">are</span> <span class="k">for</span> <span class="n">informational</span> <span class="n">purposes</span> <span class="n">only</span> <span class="ow">and</span>
<span class="n">do</span> <span class="ow">not</span> <span class="n">modify</span> <span class="n">the</span> <span class="n">License</span><span class="o">.</span> <span class="n">You</span> <span class="n">may</span> <span class="n">add</span> <span class="n">Your</span> <span class="n">own</span> <span class="n">attribution</span>
<span class="n">notices</span> <span class="n">within</span> <span class="n">Derivative</span> <span class="n">Works</span> <span class="n">that</span> <span class="n">You</span> <span class="n">distribute</span><span class="p">,</span> <span class="n">alongside</span>
<span class="ow">or</span> <span class="k">as</span> <span class="n">an</span> <span class="n">addendum</span> <span class="n">to</span> <span class="n">the</span> <span class="n">NOTICE</span> <span class="n">text</span> <span class="kn">from</span> <span class="nn">the</span> <span class="n">Work</span><span class="p">,</span> <span class="n">provided</span>
<span class="n">that</span> <span class="n">such</span> <span class="n">additional</span> <span class="n">attribution</span> <span class="n">notices</span> <span class="n">cannot</span> <span class="n">be</span> <span class="n">construed</span>
<span class="k">as</span> <span class="n">modifying</span> <span class="n">the</span> <span class="n">License</span><span class="o">.</span>
<span class="n">You</span> <span class="n">may</span> <span class="n">add</span> <span class="n">Your</span> <span class="n">own</span> <span class="n">copyright</span> <span class="n">statement</span> <span class="n">to</span> <span class="n">Your</span> <span class="n">modifications</span> <span class="ow">and</span>
<span class="n">may</span> <span class="n">provide</span> <span class="n">additional</span> <span class="ow">or</span> <span class="n">different</span> <span class="n">license</span> <span class="n">terms</span> <span class="ow">and</span> <span class="n">conditions</span>
<span class="k">for</span> <span class="n">use</span><span class="p">,</span> <span class="n">reproduction</span><span class="p">,</span> <span class="ow">or</span> <span class="n">distribution</span> <span class="n">of</span> <span class="n">Your</span> <span class="n">modifications</span><span class="p">,</span> <span class="ow">or</span>
<span class="k">for</span> <span class="nb">any</span> <span class="n">such</span> <span class="n">Derivative</span> <span class="n">Works</span> <span class="k">as</span> <span class="n">a</span> <span class="n">whole</span><span class="p">,</span> <span class="n">provided</span> <span class="n">Your</span> <span class="n">use</span><span class="p">,</span>
<span class="n">reproduction</span><span class="p">,</span> <span class="ow">and</span> <span class="n">distribution</span> <span class="n">of</span> <span class="n">the</span> <span class="n">Work</span> <span class="n">otherwise</span> <span class="n">complies</span> <span class="k">with</span>
<span class="n">the</span> <span class="n">conditions</span> <span class="n">stated</span> <span class="ow">in</span> <span class="n">this</span> <span class="n">License</span><span class="o">.</span>
<span class="mf">5.</span> <span class="n">Submission</span> <span class="n">of</span> <span class="n">Contributions</span><span class="o">.</span> <span class="n">Unless</span> <span class="n">You</span> <span class="n">explicitly</span> <span class="n">state</span> <span class="n">otherwise</span><span class="p">,</span>
<span class="nb">any</span> <span class="n">Contribution</span> <span class="n">intentionally</span> <span class="n">submitted</span> <span class="k">for</span> <span class="n">inclusion</span> <span class="ow">in</span> <span class="n">the</span> <span class="n">Work</span>
<span class="n">by</span> <span class="n">You</span> <span class="n">to</span> <span class="n">the</span> <span class="n">Licensor</span> <span class="n">shall</span> <span class="n">be</span> <span class="n">under</span> <span class="n">the</span> <span class="n">terms</span> <span class="ow">and</span> <span class="n">conditions</span> <span class="n">of</span>
<span class="n">this</span> <span class="n">License</span><span class="p">,</span> <span class="n">without</span> <span class="nb">any</span> <span class="n">additional</span> <span class="n">terms</span> <span class="ow">or</span> <span class="n">conditions</span><span class="o">.</span>
<span class="n">Notwithstanding</span> <span class="n">the</span> <span class="n">above</span><span class="p">,</span> <span class="n">nothing</span> <span class="n">herein</span> <span class="n">shall</span> <span class="n">supersede</span> <span class="ow">or</span> <span class="n">modify</span>
<span class="n">the</span> <span class="n">terms</span> <span class="n">of</span> <span class="nb">any</span> <span class="n">separate</span> <span class="n">license</span> <span class="n">agreement</span> <span class="n">you</span> <span class="n">may</span> <span class="n">have</span> <span class="n">executed</span>
<span class="k">with</span> <span class="n">Licensor</span> <span class="n">regarding</span> <span class="n">such</span> <span class="n">Contributions</span><span class="o">.</span>
<span class="mf">6.</span> <span class="n">Trademarks</span><span class="o">.</span> <span class="n">This</span> <span class="n">License</span> <span class="n">does</span> <span class="ow">not</span> <span class="n">grant</span> <span class="n">permission</span> <span class="n">to</span> <span class="n">use</span> <span class="n">the</span> <span class="n">trade</span>
<span class="n">names</span><span class="p">,</span> <span class="n">trademarks</span><span class="p">,</span> <span class="n">service</span> <span class="n">marks</span><span class="p">,</span> <span class="ow">or</span> <span class="n">product</span> <span class="n">names</span> <span class="n">of</span> <span class="n">the</span> <span class="n">Licensor</span><span class="p">,</span>
<span class="k">except</span> <span class="k">as</span> <span class="n">required</span> <span class="k">for</span> <span class="n">reasonable</span> <span class="ow">and</span> <span class="n">customary</span> <span class="n">use</span> <span class="ow">in</span> <span class="n">describing</span> <span class="n">the</span>
<span class="n">origin</span> <span class="n">of</span> <span class="n">the</span> <span class="n">Work</span> <span class="ow">and</span> <span class="n">reproducing</span> <span class="n">the</span> <span class="n">content</span> <span class="n">of</span> <span class="n">the</span> <span class="n">NOTICE</span> <span class="n">file</span><span class="o">.</span>
<span class="mf">7.</span> <span class="n">Disclaimer</span> <span class="n">of</span> <span class="n">Warranty</span><span class="o">.</span> <span class="n">Unless</span> <span class="n">required</span> <span class="n">by</span> <span class="n">applicable</span> <span class="n">law</span> <span class="ow">or</span>
<span class="n">agreed</span> <span class="n">to</span> <span class="ow">in</span> <span class="n">writing</span><span class="p">,</span> <span class="n">Licensor</span> <span class="n">provides</span> <span class="n">the</span> <span class="n">Work</span> <span class="p">(</span><span class="ow">and</span> <span class="n">each</span>
<span class="n">Contributor</span> <span class="n">provides</span> <span class="n">its</span> <span class="n">Contributions</span><span class="p">)</span> <span class="n">on</span> <span class="n">an</span> <span class="s2">&quot;AS IS&quot;</span> <span class="n">BASIS</span><span class="p">,</span>
<span class="n">WITHOUT</span> <span class="n">WARRANTIES</span> <span class="n">OR</span> <span class="n">CONDITIONS</span> <span class="n">OF</span> <span class="n">ANY</span> <span class="n">KIND</span><span class="p">,</span> <span class="n">either</span> <span class="n">express</span> <span class="ow">or</span>
<span class="n">implied</span><span class="p">,</span> <span class="n">including</span><span class="p">,</span> <span class="n">without</span> <span class="n">limitation</span><span class="p">,</span> <span class="nb">any</span> <span class="n">warranties</span> <span class="ow">or</span> <span class="n">conditions</span>
<span class="n">of</span> <span class="n">TITLE</span><span class="p">,</span> <span class="n">NON</span><span class="o">-</span><span class="n">INFRINGEMENT</span><span class="p">,</span> <span class="n">MERCHANTABILITY</span><span class="p">,</span> <span class="ow">or</span> <span class="n">FITNESS</span> <span class="n">FOR</span> <span class="n">A</span>
<span class="n">PARTICULAR</span> <span class="n">PURPOSE</span><span class="o">.</span> <span class="n">You</span> <span class="n">are</span> <span class="n">solely</span> <span class="n">responsible</span> <span class="k">for</span> <span class="n">determining</span> <span class="n">the</span>
<span class="n">appropriateness</span> <span class="n">of</span> <span class="n">using</span> <span class="ow">or</span> <span class="n">redistributing</span> <span class="n">the</span> <span class="n">Work</span> <span class="ow">and</span> <span class="n">assume</span> <span class="nb">any</span>
<span class="n">risks</span> <span class="n">associated</span> <span class="k">with</span> <span class="n">Your</span> <span class="n">exercise</span> <span class="n">of</span> <span class="n">permissions</span> <span class="n">under</span> <span class="n">this</span> <span class="n">License</span><span class="o">.</span>
<span class="mf">8.</span> <span class="n">Limitation</span> <span class="n">of</span> <span class="n">Liability</span><span class="o">.</span> <span class="n">In</span> <span class="n">no</span> <span class="n">event</span> <span class="ow">and</span> <span class="n">under</span> <span class="n">no</span> <span class="n">legal</span> <span class="n">theory</span><span class="p">,</span>
<span class="n">whether</span> <span class="ow">in</span> <span class="n">tort</span> <span class="p">(</span><span class="n">including</span> <span class="n">negligence</span><span class="p">),</span> <span class="n">contract</span><span class="p">,</span> <span class="ow">or</span> <span class="n">otherwise</span><span class="p">,</span>
<span class="n">unless</span> <span class="n">required</span> <span class="n">by</span> <span class="n">applicable</span> <span class="n">law</span> <span class="p">(</span><span class="n">such</span> <span class="k">as</span> <span class="n">deliberate</span> <span class="ow">and</span> <span class="n">grossly</span>
<span class="n">negligent</span> <span class="n">acts</span><span class="p">)</span> <span class="ow">or</span> <span class="n">agreed</span> <span class="n">to</span> <span class="ow">in</span> <span class="n">writing</span><span class="p">,</span> <span class="n">shall</span> <span class="nb">any</span> <span class="n">Contributor</span> <span class="n">be</span>
<span class="n">liable</span> <span class="n">to</span> <span class="n">You</span> <span class="k">for</span> <span class="n">damages</span><span class="p">,</span> <span class="n">including</span> <span class="nb">any</span> <span class="n">direct</span><span class="p">,</span> <span class="n">indirect</span><span class="p">,</span> <span class="n">special</span><span class="p">,</span>
<span class="n">incidental</span><span class="p">,</span> <span class="ow">or</span> <span class="n">consequential</span> <span class="n">damages</span> <span class="n">of</span> <span class="nb">any</span> <span class="n">character</span> <span class="n">arising</span> <span class="k">as</span> <span class="n">a</span>
<span class="n">result</span> <span class="n">of</span> <span class="n">this</span> <span class="n">License</span> <span class="ow">or</span> <span class="n">out</span> <span class="n">of</span> <span class="n">the</span> <span class="n">use</span> <span class="ow">or</span> <span class="n">inability</span> <span class="n">to</span> <span class="n">use</span> <span class="n">the</span>
<span class="n">Work</span> <span class="p">(</span><span class="n">including</span> <span class="n">but</span> <span class="ow">not</span> <span class="n">limited</span> <span class="n">to</span> <span class="n">damages</span> <span class="k">for</span> <span class="n">loss</span> <span class="n">of</span> <span class="n">goodwill</span><span class="p">,</span>
<span class="n">work</span> <span class="n">stoppage</span><span class="p">,</span> <span class="n">computer</span> <span class="n">failure</span> <span class="ow">or</span> <span class="n">malfunction</span><span class="p">,</span> <span class="ow">or</span> <span class="nb">any</span> <span class="ow">and</span> <span class="nb">all</span>
<span class="n">other</span> <span class="n">commercial</span> <span class="n">damages</span> <span class="ow">or</span> <span class="n">losses</span><span class="p">),</span> <span class="n">even</span> <span class="k">if</span> <span class="n">such</span> <span class="n">Contributor</span>
<span class="n">has</span> <span class="n">been</span> <span class="n">advised</span> <span class="n">of</span> <span class="n">the</span> <span class="n">possibility</span> <span class="n">of</span> <span class="n">such</span> <span class="n">damages</span><span class="o">.</span>
<span class="mf">9.</span> <span class="n">Accepting</span> <span class="n">Warranty</span> <span class="ow">or</span> <span class="n">Additional</span> <span class="n">Liability</span><span class="o">.</span> <span class="n">While</span> <span class="n">redistributing</span>
<span class="n">the</span> <span class="n">Work</span> <span class="ow">or</span> <span class="n">Derivative</span> <span class="n">Works</span> <span class="n">thereof</span><span class="p">,</span> <span class="n">You</span> <span class="n">may</span> <span class="n">choose</span> <span class="n">to</span> <span class="n">offer</span><span class="p">,</span>
<span class="ow">and</span> <span class="n">charge</span> <span class="n">a</span> <span class="n">fee</span> <span class="k">for</span><span class="p">,</span> <span class="n">acceptance</span> <span class="n">of</span> <span class="n">support</span><span class="p">,</span> <span class="n">warranty</span><span class="p">,</span> <span class="n">indemnity</span><span class="p">,</span>
<span class="ow">or</span> <span class="n">other</span> <span class="n">liability</span> <span class="n">obligations</span> <span class="ow">and</span><span class="o">/</span><span class="ow">or</span> <span class="n">rights</span> <span class="n">consistent</span> <span class="k">with</span> <span class="n">this</span>
<span class="n">License</span><span class="o">.</span> <span class="n">However</span><span class="p">,</span> <span class="ow">in</span> <span class="n">accepting</span> <span class="n">such</span> <span class="n">obligations</span><span class="p">,</span> <span class="n">You</span> <span class="n">may</span> <span class="n">act</span> <span class="n">only</span>
<span class="n">on</span> <span class="n">Your</span> <span class="n">own</span> <span class="n">behalf</span> <span class="ow">and</span> <span class="n">on</span> <span class="n">Your</span> <span class="n">sole</span> <span class="n">responsibility</span><span class="p">,</span> <span class="ow">not</span> <span class="n">on</span> <span class="n">behalf</span>
<span class="n">of</span> <span class="nb">any</span> <span class="n">other</span> <span class="n">Contributor</span><span class="p">,</span> <span class="ow">and</span> <span class="n">only</span> <span class="k">if</span> <span class="n">You</span> <span class="n">agree</span> <span class="n">to</span> <span class="n">indemnify</span><span class="p">,</span>
<span class="n">defend</span><span class="p">,</span> <span class="ow">and</span> <span class="n">hold</span> <span class="n">each</span> <span class="n">Contributor</span> <span class="n">harmless</span> <span class="k">for</span> <span class="nb">any</span> <span class="n">liability</span>
<span class="n">incurred</span> <span class="n">by</span><span class="p">,</span> <span class="ow">or</span> <span class="n">claims</span> <span class="n">asserted</span> <span class="n">against</span><span class="p">,</span> <span class="n">such</span> <span class="n">Contributor</span> <span class="n">by</span> <span class="n">reason</span>
<span class="n">of</span> <span class="n">your</span> <span class="n">accepting</span> <span class="nb">any</span> <span class="n">such</span> <span class="n">warranty</span> <span class="ow">or</span> <span class="n">additional</span> <span class="n">liability</span><span class="o">.</span>
</pre>
</div>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Concepts</h1>
<p>The Airflow Platform is a tool for describing, executing, and monitoring
workflows.</p>
<div class="section" id="core-ideas">
<h2 class="sigil_not_in_toc">Core Ideas</h2>
<div class="section" id="dags">
<h3 class="sigil_not_in_toc">DAGs</h3>
<p>In Airflow, a <code class="docutils literal notranslate"><span class="pre">DAG</span></code> &#x2013; or a Directed Acyclic Graph &#x2013; is a collection of all
the tasks you want to run, organized in a way that reflects their relationships
and dependencies.</p>
<p>For example, a simple DAG could consist of three tasks: A, B, and C. It could
say that A has to run successfully before B can run, but C can run anytime. It
could say that task A times out after 5 minutes, and B can be restarted up to 5
times in case it fails. It might also say that the workflow will run every night
at 10pm, but shouldn&#x2019;t start until a certain date.</p>
<p>In this way, a DAG describes <em>how</em> you want to carry out your workflow; but
notice that we haven&#x2019;t said anything about <em>what</em> we actually want to do! A, B,
and C could be anything. Maybe A prepares data for B to analyze while C sends an
email. Or perhaps A monitors your location so B can open your garage door while
C turns on your house lights. The important thing is that the DAG isn&#x2019;t
concerned with what its constituent tasks do; its job is to make sure that
whatever they do happens at the right time, or in the right order, or with the
right handling of any unexpected issues.</p>
<p>DAGs are defined in standard Python files that are placed in Airflow&#x2019;s
<code class="docutils literal notranslate"><span class="pre">DAG_FOLDER</span></code>. Airflow will execute the code in each file to dynamically build
the <code class="docutils literal notranslate"><span class="pre">DAG</span></code> objects. You can have as many DAGs as you want, each describing an
arbitrary number of tasks. In general, each one should correspond to a single
logical workflow.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">When searching for DAGs, Airflow will only consider files where the string
&#x201C;airflow&#x201D; and &#x201C;DAG&#x201D; both appear in the contents of the <code class="docutils literal notranslate"><span class="pre">.py</span></code> file.</p>
</div>
<div class="section" id="scope">
<h4 class="sigil_not_in_toc">Scope</h4>
<p>Airflow will load any <code class="docutils literal notranslate"><span class="pre">DAG</span></code> object it can import from a DAGfile. Critically,
that means the DAG must appear in <code class="docutils literal notranslate"><span class="pre">globals()</span></code>. Consider the following two
DAGs. Only <code class="docutils literal notranslate"><span class="pre">dag_1</span></code> will be loaded; the other one only appears in a local
scope.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">dag_1</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="s1">&apos;this_dag_will_be_discovered&apos;</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">my_function</span><span class="p">():</span>
<span class="n">dag_2</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="s1">&apos;but_this_dag_will_not&apos;</span><span class="p">)</span>
<span class="n">my_function</span><span class="p">()</span>
</pre>
</div>
</div>
<p>Sometimes this can be put to good use. For example, a common pattern with
<code class="docutils literal notranslate"><span class="pre">SubDagOperator</span></code> is to define the subdag inside a function so that Airflow
doesn&#x2019;t try to load it as a standalone DAG.</p>
</div>
<div class="section" id="default-arguments">
<h4 class="sigil_not_in_toc">Default Arguments</h4>
<p>If a dictionary of <code class="docutils literal notranslate"><span class="pre">default_args</span></code> is passed to a DAG, it will apply them to
any of its operators. This makes it easy to apply a common parameter to many operators without having to type it many times.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">default_args</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">&apos;start_date&apos;</span><span class="p">:</span> <span class="n">datetime</span><span class="p">(</span><span class="mi">2016</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="s1">&apos;owner&apos;</span><span class="p">:</span> <span class="s1">&apos;Airflow&apos;</span>
<span class="p">}</span>
<span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="s1">&apos;my_dag&apos;</span><span class="p">,</span> <span class="n">default_args</span><span class="o">=</span><span class="n">default_args</span><span class="p">)</span>
<span class="n">op</span> <span class="o">=</span> <span class="n">DummyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;dummy&apos;</span><span class="p">,</span> <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">op</span><span class="o">.</span><span class="n">owner</span><span class="p">)</span> <span class="c1"># Airflow</span>
</pre>
</div>
</div>
</div>
<div class="section" id="context-manager">
<h4 class="sigil_not_in_toc">Context Manager</h4>
<p><em>Added in Airflow 1.8</em></p>
<p>DAGs can be used as context managers to automatically assign new operators to that DAG.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="s1">&apos;my_dag&apos;</span><span class="p">,</span> <span class="n">start_date</span><span class="o">=</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2016</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> <span class="k">as</span> <span class="n">dag</span><span class="p">:</span>
<span class="n">op</span> <span class="o">=</span> <span class="n">DummyOperator</span><span class="p">(</span><span class="s1">&apos;op&apos;</span><span class="p">)</span>
<span class="n">op</span><span class="o">.</span><span class="n">dag</span> <span class="ow">is</span> <span class="n">dag</span> <span class="c1"># True</span>
</pre>
</div>
</div>
</div>
</div>
<div class="section" id="operators">
<span id="concepts-operators"></span><h3 class="sigil_not_in_toc">Operators</h3>
<p>While DAGs describe <em>how</em> to run a workflow, <code class="docutils literal notranslate"><span class="pre">Operators</span></code> determine what
actually gets done.</p>
<p>An operator describes a single task in a workflow. Operators are usually (but
not always) atomic, meaning they can stand on their own and don&#x2019;t need to share
resources with any other operators. The DAG will make sure that operators run in
the correct certain order; other than those dependencies, operators generally
run independently. In fact, they may run on two completely different machines.</p>
<p>This is a subtle but very important point: in general, if two operators need to
share information, like a filename or small amount of data, you should consider
combining them into a single operator. If it absolutely can&#x2019;t be avoided,
Airflow does have a feature for operator cross-communication called XCom that is
described elsewhere in this document.</p>
<p>Airflow provides operators for many common tasks, including:</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">BashOperator</span></code> - executes a bash command</li>
<li><code class="docutils literal notranslate"><span class="pre">PythonOperator</span></code> - calls an arbitrary Python function</li>
<li><code class="docutils literal notranslate"><span class="pre">EmailOperator</span></code> - sends an email</li>
<li><code class="docutils literal notranslate"><span class="pre">SimpleHttpOperator</span></code> - sends an HTTP request</li>
<li><code class="docutils literal notranslate"><span class="pre">MySqlOperator</span></code>, <code class="docutils literal notranslate"><span class="pre">SqliteOperator</span></code>, <code class="docutils literal notranslate"><span class="pre">PostgresOperator</span></code>, <code class="docutils literal notranslate"><span class="pre">MsSqlOperator</span></code>, <code class="docutils literal notranslate"><span class="pre">OracleOperator</span></code>, <code class="docutils literal notranslate"><span class="pre">JdbcOperator</span></code>, etc. - executes a SQL command</li>
<li><code class="docutils literal notranslate"><span class="pre">Sensor</span></code> - waits for a certain time, file, database row, S3 key, etc&#x2026;</li>
</ul>
<p>In addition to these basic building blocks, there are many more specific
operators: <code class="docutils literal notranslate"><span class="pre">DockerOperator</span></code>, <code class="docutils literal notranslate"><span class="pre">HiveOperator</span></code>, <code class="docutils literal notranslate"><span class="pre">S3FileTransformOperator</span></code>,
<code class="docutils literal notranslate"><span class="pre">PrestoToMysqlOperator</span></code>, <code class="docutils literal notranslate"><span class="pre">SlackOperator</span></code>&#x2026; you get the idea!</p>
<p>The <code class="docutils literal notranslate"><span class="pre">airflow/contrib/</span></code> directory contains yet more operators built by the
community. These operators aren&#x2019;t always as complete or well-tested as those in
the main distribution, but allow users to more easily add new functionality to
the platform.</p>
<p>Operators are only loaded by Airflow if they are assigned to a DAG.</p>
<p>See <a class="reference internal" href="howto/operator.html"><span class="doc">Using Operators</span></a> for how to use Airflow operators.</p>
<div class="section" id="dag-assignment">
<h4 class="sigil_not_in_toc">DAG Assignment</h4>
<p><em>Added in Airflow 1.8</em></p>
<p>Operators do not have to be assigned to DAGs immediately (previously <code class="docutils literal notranslate"><span class="pre">dag</span></code> was
a required argument). However, once an operator is assigned to a DAG, it can not
be transferred or unassigned. DAG assignment can be done explicitly when the
operator is created, through deferred assignment, or even inferred from other
operators.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="s1">&apos;my_dag&apos;</span><span class="p">,</span> <span class="n">start_date</span><span class="o">=</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2016</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="c1"># sets the DAG explicitly</span>
<span class="n">explicit_op</span> <span class="o">=</span> <span class="n">DummyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;op1&apos;</span><span class="p">,</span> <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="c1"># deferred DAG assignment</span>
<span class="n">deferred_op</span> <span class="o">=</span> <span class="n">DummyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;op2&apos;</span><span class="p">)</span>
<span class="n">deferred_op</span><span class="o">.</span><span class="n">dag</span> <span class="o">=</span> <span class="n">dag</span>
<span class="c1"># inferred DAG assignment (linked operators must be in the same DAG)</span>
<span class="n">inferred_op</span> <span class="o">=</span> <span class="n">DummyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;op3&apos;</span><span class="p">)</span>
<span class="n">inferred_op</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">(</span><span class="n">deferred_op</span><span class="p">)</span>
</pre>
</div>
</div>
</div>
<div class="section" id="bitshift-composition">
<h4 class="sigil_not_in_toc">Bitshift Composition</h4>
<p><em>Added in Airflow 1.8</em></p>
<p>Traditionally, operator relationships are set with the <code class="docutils literal notranslate"><span class="pre">set_upstream()</span></code> and
<code class="docutils literal notranslate"><span class="pre">set_downstream()</span></code> methods. In Airflow 1.8, this can be done with the Python
bitshift operators <code class="docutils literal notranslate"><span class="pre">&gt;&gt;</span></code> and <code class="docutils literal notranslate"><span class="pre">&lt;&lt;</span></code>. The following four statements are all
functionally equivalent:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">op1</span> <span class="o">&gt;&gt;</span> <span class="n">op2</span>
<span class="n">op1</span><span class="o">.</span><span class="n">set_downstream</span><span class="p">(</span><span class="n">op2</span><span class="p">)</span>
<span class="n">op2</span> <span class="o">&lt;&lt;</span> <span class="n">op1</span>
<span class="n">op2</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">(</span><span class="n">op1</span><span class="p">)</span>
</pre>
</div>
</div>
<p>When using the bitshift to compose operators, the relationship is set in the
direction that the bitshift operator points. For example, <code class="docutils literal notranslate"><span class="pre">op1</span> <span class="pre">&gt;&gt;</span> <span class="pre">op2</span></code> means
that <code class="docutils literal notranslate"><span class="pre">op1</span></code> runs first and <code class="docutils literal notranslate"><span class="pre">op2</span></code> runs second. Multiple operators can be
composed &#x2013; keep in mind the chain is executed left-to-right and the rightmost
object is always returned. For example:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">op1</span> <span class="o">&gt;&gt;</span> <span class="n">op2</span> <span class="o">&gt;&gt;</span> <span class="n">op3</span> <span class="o">&lt;&lt;</span> <span class="n">op4</span>
</pre>
</div>
</div>
<p>is equivalent to:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">op1</span><span class="o">.</span><span class="n">set_downstream</span><span class="p">(</span><span class="n">op2</span><span class="p">)</span>
<span class="n">op2</span><span class="o">.</span><span class="n">set_downstream</span><span class="p">(</span><span class="n">op3</span><span class="p">)</span>
<span class="n">op3</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">(</span><span class="n">op4</span><span class="p">)</span>
</pre>
</div>
</div>
<p>For convenience, the bitshift operators can also be used with DAGs. For example:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">dag</span> <span class="o">&gt;&gt;</span> <span class="n">op1</span> <span class="o">&gt;&gt;</span> <span class="n">op2</span>
</pre>
</div>
</div>
<p>is equivalent to:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">op1</span><span class="o">.</span><span class="n">dag</span> <span class="o">=</span> <span class="n">dag</span>
<span class="n">op1</span><span class="o">.</span><span class="n">set_downstream</span><span class="p">(</span><span class="n">op2</span><span class="p">)</span>
</pre>
</div>
</div>
<p>We can put this all together to build a simple pipeline:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="s1">&apos;my_dag&apos;</span><span class="p">,</span> <span class="n">start_date</span><span class="o">=</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2016</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> <span class="k">as</span> <span class="n">dag</span><span class="p">:</span>
<span class="p">(</span>
<span class="n">DummyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;dummy_1&apos;</span><span class="p">)</span>
<span class="o">&gt;&gt;</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;bash_1&apos;</span><span class="p">,</span>
<span class="n">bash_command</span><span class="o">=</span><span class="s1">&apos;echo &quot;HELLO!&quot;&apos;</span><span class="p">)</span>
<span class="o">&gt;&gt;</span> <span class="n">PythonOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;python_1&apos;</span><span class="p">,</span>
<span class="n">python_callable</span><span class="o">=</span><span class="k">lambda</span><span class="p">:</span> <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;GOODBYE!&quot;</span><span class="p">))</span>
<span class="p">)</span>
</pre>
</div>
</div>
</div>
</div>
<div class="section" id="tasks">
<h3 class="sigil_not_in_toc">Tasks</h3>
<p>Once an operator is instantiated, it is referred to as a &#x201C;task&#x201D;. The
instantiation defines specific values when calling the abstract operator, and
the parameterized task becomes a node in a DAG.</p>
</div>
<div class="section" id="task-instances">
<h3 class="sigil_not_in_toc">Task Instances</h3>
<p>A task instance represents a specific run of a task and is characterized as the
combination of a dag, a task, and a point in time. Task instances also have an
indicative state, which could be &#x201C;running&#x201D;, &#x201C;success&#x201D;, &#x201C;failed&#x201D;, &#x201C;skipped&#x201D;, &#x201C;up
for retry&#x201D;, etc.</p>
</div>
<div class="section" id="workflows">
<h3 class="sigil_not_in_toc">Workflows</h3>
<p>You&#x2019;re now familiar with the core building blocks of Airflow.
Some of the concepts may sound very similar, but the vocabulary can
be conceptualized like this:</p>
<ul class="simple">
<li>DAG: a description of the order in which work should take place</li>
<li>Operator: a class that acts as a template for carrying out some work</li>
<li>Task: a parameterized instance of an operator</li>
<li>Task Instance: a task that 1) has been assigned to a DAG and 2) has a
state associated with a specific run of the DAG</li>
</ul>
<p>By combining <code class="docutils literal notranslate"><span class="pre">DAGs</span></code> and <code class="docutils literal notranslate"><span class="pre">Operators</span></code> to create <code class="docutils literal notranslate"><span class="pre">TaskInstances</span></code>, you can
build complex workflows.</p>
</div>
</div>
<div class="section" id="additional-functionality">
<h2 class="sigil_not_in_toc">Additional Functionality</h2>
<p>In addition to the core Airflow objects, there are a number of more complex
features that enable behaviors like limiting simultaneous access to resources,
cross-communication, conditional execution, and more.</p>
<div class="section" id="hooks">
<h3 class="sigil_not_in_toc">Hooks</h3>
<p>Hooks are interfaces to external platforms and databases like Hive, S3,
MySQL, Postgres, HDFS, and Pig. Hooks implement a common interface when
possible, and act as a building block for operators. They also use
the <code class="docutils literal notranslate"><span class="pre">airflow.models.Connection</span></code> model to retrieve hostnames
and authentication information. Hooks keep authentication code and
information out of pipelines, centralized in the metadata database.</p>
<p>Hooks are also very useful on their own to use in Python scripts,
Airflow airflow.operators.PythonOperator, and in interactive environments
like iPython or Jupyter Notebook.</p>
</div>
<div class="section" id="pools">
<h3 class="sigil_not_in_toc">Pools</h3>
<p>Some systems can get overwhelmed when too many processes hit them at the same
time. Airflow pools can be used to <strong>limit the execution parallelism</strong> on
arbitrary sets of tasks. The list of pools is managed in the UI
(<code class="docutils literal notranslate"><span class="pre">Menu</span> <span class="pre">-&gt;</span> <span class="pre">Admin</span> <span class="pre">-&gt;</span> <span class="pre">Pools</span></code>) by giving the pools a name and assigning
it a number of worker slots. Tasks can then be associated with
one of the existing pools by using the <code class="docutils literal notranslate"><span class="pre">pool</span></code> parameter when
creating tasks (i.e., instantiating operators).</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">aggregate_db_message_job</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;aggregate_db_message_job&apos;</span><span class="p">,</span>
<span class="n">execution_timeout</span><span class="o">=</span><span class="n">timedelta</span><span class="p">(</span><span class="n">hours</span><span class="o">=</span><span class="mi">3</span><span class="p">),</span>
<span class="n">pool</span><span class="o">=</span><span class="s1">&apos;ep_data_pipeline_db_msg_agg&apos;</span><span class="p">,</span>
<span class="n">bash_command</span><span class="o">=</span><span class="n">aggregate_db_message_job_cmd</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">aggregate_db_message_job</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">(</span><span class="n">wait_for_empty_queue</span><span class="p">)</span>
</pre>
</div>
</div>
<p>The <code class="docutils literal notranslate"><span class="pre">pool</span></code> parameter can
be used in conjunction with <code class="docutils literal notranslate"><span class="pre">priority_weight</span></code> to define priorities
in the queue, and which tasks get executed first as slots open up in the
pool. The default <code class="docutils literal notranslate"><span class="pre">priority_weight</span></code> is <code class="docutils literal notranslate"><span class="pre">1</span></code>, and can be bumped to any
number. When sorting the queue to evaluate which task should be executed
next, we use the <code class="docutils literal notranslate"><span class="pre">priority_weight</span></code>, summed up with all of the
<code class="docutils literal notranslate"><span class="pre">priority_weight</span></code> values from tasks downstream from this task. You can
use this to bump a specific important task and the whole path to that task
gets prioritized accordingly.</p>
<p>Tasks will be scheduled as usual while the slots fill up. Once capacity is
reached, runnable tasks get queued and their state will show as such in the
UI. As slots free up, queued tasks start running based on the
<code class="docutils literal notranslate"><span class="pre">priority_weight</span></code> (of the task and its descendants).</p>
<p>Note that by default tasks aren&#x2019;t assigned to any pool and their
execution parallelism is only limited to the executor&#x2019;s setting.</p>
</div>
<div class="section" id="connections">
<span id="concepts-connections"></span><h3 class="sigil_not_in_toc">Connections</h3>
<p>The connection information to external systems is stored in the Airflow
metadata database and managed in the UI (<code class="docutils literal notranslate"><span class="pre">Menu</span> <span class="pre">-&gt;</span> <span class="pre">Admin</span> <span class="pre">-&gt;</span> <span class="pre">Connections</span></code>)
A <code class="docutils literal notranslate"><span class="pre">conn_id</span></code> is defined there and hostname / login / password / schema
information attached to it. Airflow pipelines can simply refer to the
centrally managed <code class="docutils literal notranslate"><span class="pre">conn_id</span></code> without having to hard code any of this
information anywhere.</p>
<p>Many connections with the same <code class="docutils literal notranslate"><span class="pre">conn_id</span></code> can be defined and when that
is the case, and when the <strong>hooks</strong> uses the <code class="docutils literal notranslate"><span class="pre">get_connection</span></code> method
from <code class="docutils literal notranslate"><span class="pre">BaseHook</span></code>, Airflow will choose one connection randomly, allowing
for some basic load balancing and fault tolerance when used in conjunction
with retries.</p>
<p>Airflow also has the ability to reference connections via environment
variables from the operating system. But it only supports URI format. If you
need to specify <code class="docutils literal notranslate"><span class="pre">extra</span></code> for your connection, please use web UI.</p>
<p>If connections with the same <code class="docutils literal notranslate"><span class="pre">conn_id</span></code> are defined in both Airflow metadata
database and environment variables, only the one in environment variables
will be referenced by Airflow (for example, given <code class="docutils literal notranslate"><span class="pre">conn_id</span></code> <code class="docutils literal notranslate"><span class="pre">postgres_master</span></code>,
Airflow will search for <code class="docutils literal notranslate"><span class="pre">AIRFLOW_CONN_POSTGRES_MASTER</span></code>
in environment variables first and directly reference it if found,
before it starts to search in metadata database).</p>
<p>Many hooks have a default <code class="docutils literal notranslate"><span class="pre">conn_id</span></code>, where operators using that hook do not
need to supply an explicit connection ID. For example, the default
<code class="docutils literal notranslate"><span class="pre">conn_id</span></code> for the <a class="reference internal" href="code.html#airflow.hooks.postgres_hook.PostgresHook" title="airflow.hooks.postgres_hook.PostgresHook"><code class="xref py py-class docutils literal notranslate"><span class="pre">PostgresHook</span></code></a> is
<code class="docutils literal notranslate"><span class="pre">postgres_default</span></code>.</p>
<p>See <a class="reference internal" href="howto/manage-connections.html"><span class="doc">Managing Connections</span></a> for how to create and manage connections.</p>
</div>
<div class="section" id="queues">
<h3 class="sigil_not_in_toc">Queues</h3>
<p>When using the CeleryExecutor, the celery queues that tasks are sent to
can be specified. <code class="docutils literal notranslate"><span class="pre">queue</span></code> is an attribute of BaseOperator, so any
task can be assigned to any queue. The default queue for the environment
is defined in the <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>&#x2019;s <code class="docutils literal notranslate"><span class="pre">celery</span> <span class="pre">-&gt;</span> <span class="pre">default_queue</span></code>. This defines
the queue that tasks get assigned to when not specified, as well as which
queue Airflow workers listen to when started.</p>
<p>Workers can listen to one or multiple queues of tasks. When a worker is
started (using the command <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">worker</span></code>), a set of comma delimited
queue names can be specified (e.g. <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">worker</span> <span class="pre">-q</span> <span class="pre">spark</span></code>). This worker
will then only pick up tasks wired to the specified queue(s).</p>
<p>This can be useful if you need specialized workers, either from a
resource perspective (for say very lightweight tasks where one worker
could take thousands of tasks without a problem), or from an environment
perspective (you want a worker running from within the Spark cluster
itself because it needs a very specific environment and security rights).</p>
</div>
<div class="section" id="xcoms">
<h3 class="sigil_not_in_toc">XComs</h3>
<p>XComs let tasks exchange messages, allowing more nuanced forms of control and
shared state. The name is an abbreviation of &#x201C;cross-communication&#x201D;. XComs are
principally defined by a key, value, and timestamp, but also track attributes
like the task/DAG that created the XCom and when it should become visible. Any
object that can be pickled can be used as an XCom value, so users should make
sure to use objects of appropriate size.</p>
<p>XComs can be &#x201C;pushed&#x201D; (sent) or &#x201C;pulled&#x201D; (received). When a task pushes an
XCom, it makes it generally available to other tasks. Tasks can push XComs at
any time by calling the <code class="docutils literal notranslate"><span class="pre">xcom_push()</span></code> method. In addition, if a task returns
a value (either from its Operator&#x2019;s <code class="docutils literal notranslate"><span class="pre">execute()</span></code> method, or from a
PythonOperator&#x2019;s <code class="docutils literal notranslate"><span class="pre">python_callable</span></code> function), then an XCom containing that
value is automatically pushed.</p>
<p>Tasks call <code class="docutils literal notranslate"><span class="pre">xcom_pull()</span></code> to retrieve XComs, optionally applying filters
based on criteria like <code class="docutils literal notranslate"><span class="pre">key</span></code>, source <code class="docutils literal notranslate"><span class="pre">task_ids</span></code>, and source <code class="docutils literal notranslate"><span class="pre">dag_id</span></code>. By
default, <code class="docutils literal notranslate"><span class="pre">xcom_pull()</span></code> filters for the keys that are automatically given to
XComs when they are pushed by being returned from execute functions (as
opposed to XComs that are pushed manually).</p>
<p>If <code class="docutils literal notranslate"><span class="pre">xcom_pull</span></code> is passed a single string for <code class="docutils literal notranslate"><span class="pre">task_ids</span></code>, then the most
recent XCom value from that task is returned; if a list of <code class="docutils literal notranslate"><span class="pre">task_ids</span></code> is
passed, then a corresponding list of XCom values is returned.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># inside a PythonOperator called &apos;pushing_task&apos;</span>
<span class="k">def</span> <span class="nf">push_function</span><span class="p">():</span>
<span class="k">return</span> <span class="n">value</span>
<span class="c1"># inside another PythonOperator where provide_context=True</span>
<span class="k">def</span> <span class="nf">pull_function</span><span class="p">(</span><span class="o">**</span><span class="n">context</span><span class="p">):</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">context</span><span class="p">[</span><span class="s1">&apos;task_instance&apos;</span><span class="p">]</span><span class="o">.</span><span class="n">xcom_pull</span><span class="p">(</span><span class="n">task_ids</span><span class="o">=</span><span class="s1">&apos;pushing_task&apos;</span><span class="p">)</span>
</pre>
</div>
</div>
<p>It is also possible to pull XCom directly in a template, here&#x2019;s an example
of what this may look like:</p>
<div class="code sql highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">SELECT</span> <span class="o">*</span> <span class="n">FROM</span> <span class="p">{{</span> <span class="n">task_instance</span><span class="o">.</span><span class="n">xcom_pull</span><span class="p">(</span><span class="n">task_ids</span><span class="o">=</span><span class="s1">&apos;foo&apos;</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="s1">&apos;table_name&apos;</span><span class="p">)</span> <span class="p">}}</span>
</pre>
</div>
</div>
<p>Note that XComs are similar to <a class="reference internal" href="#variables">Variables</a>, but are specifically designed
for inter-task communication rather than global settings.</p>
</div>
<div class="section" id="variables">
<h3 class="sigil_not_in_toc">Variables</h3>
<p>Variables are a generic way to store and retrieve arbitrary content or
settings as a simple key value store within Airflow. Variables can be
listed, created, updated and deleted from the UI (<code class="docutils literal notranslate"><span class="pre">Admin</span> <span class="pre">-&gt;</span> <span class="pre">Variables</span></code>),
code or CLI. In addition, json settings files can be bulk uploaded through
the UI. While your pipeline code definition and most of your constants
and variables should be defined in code and stored in source control,
it can be useful to have some variables or configuration items
accessible and modifiable through the UI.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">airflow.models</span> <span class="k">import</span> <span class="n">Variable</span>
<span class="n">foo</span> <span class="o">=</span> <span class="n">Variable</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;foo&quot;</span><span class="p">)</span>
<span class="n">bar</span> <span class="o">=</span> <span class="n">Variable</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;bar&quot;</span><span class="p">,</span> <span class="n">deserialize_json</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre>
</div>
</div>
<p>The second call assumes <code class="docutils literal notranslate"><span class="pre">json</span></code> content and will be deserialized into
<code class="docutils literal notranslate"><span class="pre">bar</span></code>. Note that <code class="docutils literal notranslate"><span class="pre">Variable</span></code> is a sqlalchemy model and can be used
as such.</p>
<p>You can use a variable from a jinja template with the syntax :</p>
<div class="code bash highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">echo</span> <span class="p">{{</span> <span class="n">var</span><span class="o">.</span><span class="n">value</span><span class="o">.&lt;</span><span class="n">variable_name</span><span class="o">&gt;</span> <span class="p">}}</span>
</pre>
</div>
</div>
<p>or if you need to deserialize a json object from the variable :</p>
<div class="code bash highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">echo</span> <span class="p">{{</span> <span class="n">var</span><span class="o">.</span><span class="n">json</span><span class="o">.&lt;</span><span class="n">variable_name</span><span class="o">&gt;</span> <span class="p">}}</span>
</pre>
</div>
</div>
</div>
<div class="section" id="branching">
<h3 class="sigil_not_in_toc">Branching</h3>
<p>Sometimes you need a workflow to branch, or only go down a certain path
based on an arbitrary condition which is typically related to something
that happened in an upstream task. One way to do this is by using the
<code class="docutils literal notranslate"><span class="pre">BranchPythonOperator</span></code>.</p>
<p>The <code class="docutils literal notranslate"><span class="pre">BranchPythonOperator</span></code> is much like the PythonOperator except that it
expects a python_callable that returns a task_id. The task_id returned
is followed, and all of the other paths are skipped.
The task_id returned by the Python function has to be referencing a task
directly downstream from the BranchPythonOperator task.</p>
<p>Note that using tasks with <code class="docutils literal notranslate"><span class="pre">depends_on_past=True</span></code> downstream from
<code class="docutils literal notranslate"><span class="pre">BranchPythonOperator</span></code> is logically unsound as <code class="docutils literal notranslate"><span class="pre">skipped</span></code> status
will invariably lead to block tasks that depend on their past successes.
<code class="docutils literal notranslate"><span class="pre">skipped</span></code> states propagates where all directly upstream tasks are
<code class="docutils literal notranslate"><span class="pre">skipped</span></code>.</p>
<p>If you want to skip some tasks, keep in mind that you can&#x2019;t have an empty
path, if so make a dummy task.</p>
<p>like this, the dummy task &#x201C;branch_false&#x201D; is skipped</p>
<img alt="https://airflow.apache.org/_images/branch_good.png" src="../img/05acb41b38e78540e05e8e0f1d907a51.jpg">
<p>Not like this, where the join task is skipped</p>
<img alt="https://airflow.apache.org/_images/branch_bad.png" src="../img/fb5803a17d365a3c32b19e03e28a9fde.jpg">
</div>
<div class="section" id="subdags">
<h3 class="sigil_not_in_toc">SubDAGs</h3>
<p>SubDAGs are perfect for repeating patterns. Defining a function that returns a
DAG object is a nice design pattern when using Airflow.</p>
<p>Airbnb uses the <em>stage-check-exchange</em> pattern when loading data. Data is staged
in a temporary table, after which data quality checks are performed against
that table. Once the checks all pass the partition is moved into the production
table.</p>
<p>As another example, consider the following DAG:</p>
<img alt="https://airflow.apache.org/_images/subdag_before.png" src="../img/e9ea586cae938fc2b87189ba6c5cb4f5.jpg">
<p>We can combine all of the parallel <code class="docutils literal notranslate"><span class="pre">task-*</span></code> operators into a single SubDAG,
so that the resulting DAG resembles the following:</p>
<img alt="https://airflow.apache.org/_images/subdag_after.png" src="../img/9231dcec481ea674f2cd8706b9bf499d.jpg">
<p>Note that SubDAG operators should contain a factory method that returns a DAG
object. This will prevent the SubDAG from being treated like a separate DAG in
the main UI. For example:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1">#dags/subdag.py</span>
<span class="kn">from</span> <span class="nn">airflow.models</span> <span class="k">import</span> <span class="n">DAG</span>
<span class="kn">from</span> <span class="nn">airflow.operators.dummy_operator</span> <span class="k">import</span> <span class="n">DummyOperator</span>
<span class="c1"># Dag is returned by a factory method</span>
<span class="k">def</span> <span class="nf">sub_dag</span><span class="p">(</span><span class="n">parent_dag_name</span><span class="p">,</span> <span class="n">child_dag_name</span><span class="p">,</span> <span class="n">start_date</span><span class="p">,</span> <span class="n">schedule_interval</span><span class="p">):</span>
<span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span>
<span class="s1">&apos;</span><span class="si">%s</span><span class="s1">.</span><span class="si">%s</span><span class="s1">&apos;</span> <span class="o">%</span> <span class="p">(</span><span class="n">parent_dag_name</span><span class="p">,</span> <span class="n">child_dag_name</span><span class="p">),</span>
<span class="n">schedule_interval</span><span class="o">=</span><span class="n">schedule_interval</span><span class="p">,</span>
<span class="n">start_date</span><span class="o">=</span><span class="n">start_date</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">dummy_operator</span> <span class="o">=</span> <span class="n">DummyOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;dummy_task&apos;</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">dag</span>
</pre>
</div>
</div>
<p>This SubDAG can then be referenced in your main DAG file:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># main_dag.py</span>
<span class="kn">from</span> <span class="nn">datetime</span> <span class="k">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span>
<span class="kn">from</span> <span class="nn">airflow.models</span> <span class="k">import</span> <span class="n">DAG</span>
<span class="kn">from</span> <span class="nn">airflow.operators.subdag_operator</span> <span class="k">import</span> <span class="n">SubDagOperator</span>
<span class="kn">from</span> <span class="nn">dags.subdag</span> <span class="k">import</span> <span class="n">sub_dag</span>
<span class="n">PARENT_DAG_NAME</span> <span class="o">=</span> <span class="s1">&apos;parent_dag&apos;</span>
<span class="n">CHILD_DAG_NAME</span> <span class="o">=</span> <span class="s1">&apos;child_dag&apos;</span>
<span class="n">main_dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span>
<span class="n">dag_id</span><span class="o">=</span><span class="n">PARENT_DAG_NAME</span><span class="p">,</span>
<span class="n">schedule_interval</span><span class="o">=</span><span class="n">timedelta</span><span class="p">(</span><span class="n">hours</span><span class="o">=</span><span class="mi">1</span><span class="p">),</span>
<span class="n">start_date</span><span class="o">=</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2016</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">sub_dag</span> <span class="o">=</span> <span class="n">SubDagOperator</span><span class="p">(</span>
<span class="n">subdag</span><span class="o">=</span><span class="n">sub_dag</span><span class="p">(</span><span class="n">PARENT_DAG_NAME</span><span class="p">,</span> <span class="n">CHILD_DAG_NAME</span><span class="p">,</span> <span class="n">main_dag</span><span class="o">.</span><span class="n">start_date</span><span class="p">,</span>
<span class="n">main_dag</span><span class="o">.</span><span class="n">schedule_interval</span><span class="p">),</span>
<span class="n">task_id</span><span class="o">=</span><span class="n">CHILD_DAG_NAME</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">main_dag</span><span class="p">,</span>
<span class="p">)</span>
</pre>
</div>
</div>
<p>You can zoom into a SubDagOperator from the graph view of the main DAG to show
the tasks contained within the SubDAG:</p>
<img alt="https://airflow.apache.org/_images/subdag_zoom.png" src="../img/764cd9d9d35739e2aaba43358950aed5.jpg">
<p>Some other tips when using SubDAGs:</p>
<ul class="simple">
<li>by convention, a SubDAG&#x2019;s <code class="docutils literal notranslate"><span class="pre">dag_id</span></code> should be prefixed by its parent and
a dot. As in <code class="docutils literal notranslate"><span class="pre">parent.child</span></code></li>
<li>share arguments between the main DAG and the SubDAG by passing arguments to
the SubDAG operator (as demonstrated above)</li>
<li>SubDAGs must have a schedule and be enabled. If the SubDAG&#x2019;s schedule is
set to <code class="docutils literal notranslate"><span class="pre">None</span></code> or <code class="docutils literal notranslate"><span class="pre">@once</span></code>, the SubDAG will succeed without having done
anything</li>
<li>clearing a SubDagOperator also clears the state of the tasks within</li>
<li>marking success on a SubDagOperator does not affect the state of the tasks
within</li>
<li>refrain from using <code class="docutils literal notranslate"><span class="pre">depends_on_past=True</span></code> in tasks within the SubDAG as
this can be confusing</li>
<li>it is possible to specify an executor for the SubDAG. It is common to use
the SequentialExecutor if you want to run the SubDAG in-process and
effectively limit its parallelism to one. Using LocalExecutor can be
problematic as it may over-subscribe your worker, running multiple tasks in
a single slot</li>
</ul>
<p>See <code class="docutils literal notranslate"><span class="pre">airflow/example_dags</span></code> for a demonstration.</p>
</div>
<div class="section" id="slas">
<h3 class="sigil_not_in_toc">SLAs</h3>
<p>Service Level Agreements, or time by which a task or DAG should have
succeeded, can be set at a task level as a <code class="docutils literal notranslate"><span class="pre">timedelta</span></code>. If
one or many instances have not succeeded by that time, an alert email is sent
detailing the list of tasks that missed their SLA. The event is also recorded
in the database and made available in the web UI under <code class="docutils literal notranslate"><span class="pre">Browse-&gt;Missed</span> <span class="pre">SLAs</span></code>
where events can be analyzed and documented.</p>
</div>
<div class="section" id="trigger-rules">
<h3 class="sigil_not_in_toc">Trigger Rules</h3>
<p>Though the normal workflow behavior is to trigger tasks when all their
directly upstream tasks have succeeded, Airflow allows for more complex
dependency settings.</p>
<p>All operators have a <code class="docutils literal notranslate"><span class="pre">trigger_rule</span></code> argument which defines the rule by which
the generated task get triggered. The default value for <code class="docutils literal notranslate"><span class="pre">trigger_rule</span></code> is
<code class="docutils literal notranslate"><span class="pre">all_success</span></code> and can be defined as &#x201C;trigger this task when all directly
upstream tasks have succeeded&#x201D;. All other rules described here are based
on direct parent tasks and are values that can be passed to any operator
while creating tasks:</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">all_success</span></code>: (default) all parents have succeeded</li>
<li><code class="docutils literal notranslate"><span class="pre">all_failed</span></code>: all parents are in a <code class="docutils literal notranslate"><span class="pre">failed</span></code> or <code class="docutils literal notranslate"><span class="pre">upstream_failed</span></code> state</li>
<li><code class="docutils literal notranslate"><span class="pre">all_done</span></code>: all parents are done with their execution</li>
<li><code class="docutils literal notranslate"><span class="pre">one_failed</span></code>: fires as soon as at least one parent has failed, it does not wait for all parents to be done</li>
<li><code class="docutils literal notranslate"><span class="pre">one_success</span></code>: fires as soon as at least one parent succeeds, it does not wait for all parents to be done</li>
<li><code class="docutils literal notranslate"><span class="pre">dummy</span></code>: dependencies are just for show, trigger at will</li>
</ul>
<p>Note that these can be used in conjunction with <code class="docutils literal notranslate"><span class="pre">depends_on_past</span></code> (boolean)
that, when set to <code class="docutils literal notranslate"><span class="pre">True</span></code>, keeps a task from getting triggered if the
previous schedule for the task hasn&#x2019;t succeeded.</p>
</div>
<div class="section" id="latest-run-only">
<h3 class="sigil_not_in_toc">Latest Run Only</h3>
<p>Standard workflow behavior involves running a series of tasks for a
particular date/time range. Some workflows, however, perform tasks that
are independent of run time but need to be run on a schedule, much like a
standard cron job. In these cases, backfills or running jobs missed during
a pause just wastes CPU cycles.</p>
<p>For situations like this, you can use the <code class="docutils literal notranslate"><span class="pre">LatestOnlyOperator</span></code> to skip
tasks that are not being run during the most recent scheduled run for a
DAG. The <code class="docutils literal notranslate"><span class="pre">LatestOnlyOperator</span></code> skips all immediate downstream tasks, and
itself, if the time right now is not between its <code class="docutils literal notranslate"><span class="pre">execution_time</span></code> and the
next scheduled <code class="docutils literal notranslate"><span class="pre">execution_time</span></code>.</p>
<p>One must be aware of the interaction between skipped tasks and trigger
rules. Skipped tasks will cascade through trigger rules <code class="docutils literal notranslate"><span class="pre">all_success</span></code>
and <code class="docutils literal notranslate"><span class="pre">all_failed</span></code> but not <code class="docutils literal notranslate"><span class="pre">all_done</span></code>, <code class="docutils literal notranslate"><span class="pre">one_failed</span></code>, <code class="docutils literal notranslate"><span class="pre">one_success</span></code>,
and <code class="docutils literal notranslate"><span class="pre">dummy</span></code>. If you would like to use the <code class="docutils literal notranslate"><span class="pre">LatestOnlyOperator</span></code> with
trigger rules that do not cascade skips, you will need to ensure that the
<code class="docutils literal notranslate"><span class="pre">LatestOnlyOperator</span></code> is <strong>directly</strong> upstream of the task you would like
to skip.</p>
<p>It is possible, through use of trigger rules to mix tasks that should run
in the typical date/time dependent mode and those using the
<code class="docutils literal notranslate"><span class="pre">LatestOnlyOperator</span></code>.</p>
<p>For example, consider the following dag:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1">#dags/latest_only_with_trigger.py</span>
<span class="kn">import</span> <span class="nn">datetime</span> <span class="k">as</span> <span class="nn">dt</span>
<span class="kn">from</span> <span class="nn">airflow.models</span> <span class="k">import</span> <span class="n">DAG</span>
<span class="kn">from</span> <span class="nn">airflow.operators.dummy_operator</span> <span class="k">import</span> <span class="n">DummyOperator</span>
<span class="kn">from</span> <span class="nn">airflow.operators.latest_only_operator</span> <span class="k">import</span> <span class="n">LatestOnlyOperator</span>
<span class="kn">from</span> <span class="nn">airflow.utils.trigger_rule</span> <span class="k">import</span> <span class="n">TriggerRule</span>
<span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span>
<span class="n">dag_id</span><span class="o">=</span><span class="s1">&apos;latest_only_with_trigger&apos;</span><span class="p">,</span>
<span class="n">schedule_interval</span><span class="o">=</span><span class="n">dt</span><span class="o">.</span><span class="n">timedelta</span><span class="p">(</span><span class="n">hours</span><span class="o">=</span><span class="mi">4</span><span class="p">),</span>
<span class="n">start_date</span><span class="o">=</span><span class="n">dt</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2016</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">20</span><span class="p">),</span>
<span class="p">)</span>
<span class="n">latest_only</span> <span class="o">=</span> <span class="n">LatestOnlyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;latest_only&apos;</span><span class="p">,</span> <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">task1</span> <span class="o">=</span> <span class="n">DummyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;task1&apos;</span><span class="p">,</span> <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">task1</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">(</span><span class="n">latest_only</span><span class="p">)</span>
<span class="n">task2</span> <span class="o">=</span> <span class="n">DummyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;task2&apos;</span><span class="p">,</span> <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">task3</span> <span class="o">=</span> <span class="n">DummyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;task3&apos;</span><span class="p">,</span> <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">task3</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">([</span><span class="n">task1</span><span class="p">,</span> <span class="n">task2</span><span class="p">])</span>
<span class="n">task4</span> <span class="o">=</span> <span class="n">DummyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;task4&apos;</span><span class="p">,</span> <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">,</span>
<span class="n">trigger_rule</span><span class="o">=</span><span class="n">TriggerRule</span><span class="o">.</span><span class="n">ALL_DONE</span><span class="p">)</span>
<span class="n">task4</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">([</span><span class="n">task1</span><span class="p">,</span> <span class="n">task2</span><span class="p">])</span>
</pre>
</div>
</div>
<p>In the case of this dag, the <code class="docutils literal notranslate"><span class="pre">latest_only</span></code> task will show up as skipped
for all runs except the latest run. <code class="docutils literal notranslate"><span class="pre">task1</span></code> is directly downstream of
<code class="docutils literal notranslate"><span class="pre">latest_only</span></code> and will also skip for all runs except the latest.
<code class="docutils literal notranslate"><span class="pre">task2</span></code> is entirely independent of <code class="docutils literal notranslate"><span class="pre">latest_only</span></code> and will run in all
scheduled periods. <code class="docutils literal notranslate"><span class="pre">task3</span></code> is downstream of <code class="docutils literal notranslate"><span class="pre">task1</span></code> and <code class="docutils literal notranslate"><span class="pre">task2</span></code> and
because of the default <code class="docutils literal notranslate"><span class="pre">trigger_rule</span></code> being <code class="docutils literal notranslate"><span class="pre">all_success</span></code> will receive
a cascaded skip from <code class="docutils literal notranslate"><span class="pre">task1</span></code>. <code class="docutils literal notranslate"><span class="pre">task4</span></code> is downstream of <code class="docutils literal notranslate"><span class="pre">task1</span></code> and
<code class="docutils literal notranslate"><span class="pre">task2</span></code> but since its <code class="docutils literal notranslate"><span class="pre">trigger_rule</span></code> is set to <code class="docutils literal notranslate"><span class="pre">all_done</span></code> it will
trigger as soon as <code class="docutils literal notranslate"><span class="pre">task1</span></code> has been skipped (a valid completion state)
and <code class="docutils literal notranslate"><span class="pre">task2</span></code> has succeeded.</p>
<img alt="https://airflow.apache.org/_images/latest_only_with_trigger.png" src="../img/c93b5f5bd01ebe0b580398d4943a20f3.jpg">
</div>
<div class="section" id="zombies-undeads">
<h3 class="sigil_not_in_toc">Zombies &amp; Undeads</h3>
<p>Task instances die all the time, usually as part of their normal life cycle,
but sometimes unexpectedly.</p>
<p>Zombie tasks are characterized by the absence
of an heartbeat (emitted by the job periodically) and a <code class="docutils literal notranslate"><span class="pre">running</span></code> status
in the database. They can occur when a worker node can&#x2019;t reach the database,
when Airflow processes are killed externally, or when a node gets rebooted
for instance. Zombie killing is performed periodically by the scheduler&#x2019;s
process.</p>
<p>Undead processes are characterized by the existence of a process and a matching
heartbeat, but Airflow isn&#x2019;t aware of this task as <code class="docutils literal notranslate"><span class="pre">running</span></code> in the database.
This mismatch typically occurs as the state of the database is altered,
most likely by deleting rows in the &#x201C;Task Instances&#x201D; view in the UI.
Tasks are instructed to verify their state as part of the heartbeat routine,
and terminate themselves upon figuring out that they are in this &#x201C;undead&#x201D;
state.</p>
</div>
<div class="section" id="cluster-policy">
<h3 class="sigil_not_in_toc">Cluster Policy</h3>
<p>Your local airflow settings file can define a <code class="docutils literal notranslate"><span class="pre">policy</span></code> function that
has the ability to mutate task attributes based on other task or DAG
attributes. It receives a single argument as a reference to task objects,
and is expected to alter its attributes.</p>
<p>For example, this function could apply a specific queue property when
using a specific operator, or enforce a task timeout policy, making sure
that no tasks run for more than 48 hours. Here&#x2019;s an example of what this
may look like inside your <code class="docutils literal notranslate"><span class="pre">airflow_settings.py</span></code>:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">policy</span><span class="p">(</span><span class="n">task</span><span class="p">):</span>
<span class="k">if</span> <span class="n">task</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&apos;HivePartitionSensor&apos;</span><span class="p">:</span>
<span class="n">task</span><span class="o">.</span><span class="n">queue</span> <span class="o">=</span> <span class="s2">&quot;sensor_queue&quot;</span>
<span class="k">if</span> <span class="n">task</span><span class="o">.</span><span class="n">timeout</span> <span class="o">&gt;</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">hours</span><span class="o">=</span><span class="mi">48</span><span class="p">):</span>
<span class="n">task</span><span class="o">.</span><span class="n">timeout</span> <span class="o">=</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">hours</span><span class="o">=</span><span class="mi">48</span><span class="p">)</span>
</pre>
</div>
</div>
</div>
<div class="section" id="documentation-notes">
<h3 class="sigil_not_in_toc">Documentation &amp; Notes</h3>
<p>It&#x2019;s possible to add documentation or notes to your dags &amp; task objects that
become visible in the web interface (&#x201C;Graph View&#x201D; for dags, &#x201C;Task Details&#x201D; for
tasks). There are a set of special task attributes that get rendered as rich
content if defined:</p>
<table border="1" class="docutils">
<colgroup>
<col width="38%">
<col width="62%">
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">attribute</th>
<th class="head">rendered to</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>doc</td>
<td>monospace</td>
</tr>
<tr class="row-odd"><td>doc_json</td>
<td>json</td>
</tr>
<tr class="row-even"><td>doc_yaml</td>
<td>yaml</td>
</tr>
<tr class="row-odd"><td>doc_md</td>
<td>markdown</td>
</tr>
<tr class="row-even"><td>doc_rst</td>
<td>reStructuredText</td>
</tr>
</tbody>
</table>
<p>Please note that for dags, doc_md is the only attribute interpreted.</p>
<p>This is especially useful if your tasks are built dynamically from
configuration files, it allows you to expose the configuration that led
to the related tasks in Airflow.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd">### My great DAG</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="s1">&apos;my_dag&apos;</span><span class="p">,</span> <span class="n">default_args</span><span class="o">=</span><span class="n">default_args</span><span class="p">)</span>
<span class="n">dag</span><span class="o">.</span><span class="n">doc_md</span> <span class="o">=</span> <span class="vm">__doc__</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span><span class="s2">&quot;foo&quot;</span><span class="p">,</span> <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">t</span><span class="o">.</span><span class="n">doc_md</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;</span><span class="se">\</span>
<span class="s2">#Title&quot;</span>
<span class="s2">Here&apos;s a [url](www.airbnb.com)</span>
<span class="s2">&quot;&quot;&quot;</span>
</pre>
</div>
</div>
<p>This content will get rendered as markdown respectively in the &#x201C;Graph View&#x201D; and
&#x201C;Task Details&#x201D; pages.</p>
</div>
<div class="section" id="jinja-templating">
<span id="id1"></span><h3 class="sigil_not_in_toc">Jinja Templating</h3>
<p>Airflow leverages the power of
<a class="reference external" href="http://jinja.pocoo.org/docs/dev/">Jinja Templating</a> and this can be a
powerful tool to use in combination with macros (see the <a class="reference internal" href="code.html#macros"><span class="std std-ref">Macros</span></a> section).</p>
<p>For example, say you want to pass the execution date as an environment variable
to a Bash script using the <code class="docutils literal notranslate"><span class="pre">BashOperator</span></code>.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># The execution date as YYYY-MM-DD</span>
<span class="n">date</span> <span class="o">=</span> <span class="s2">&quot;{{ ds }}&quot;</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;test_env&apos;</span><span class="p">,</span>
<span class="n">bash_command</span><span class="o">=</span><span class="s1">&apos;/tmp/test.sh &apos;</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">,</span>
<span class="n">env</span><span class="o">=</span><span class="p">{</span><span class="s1">&apos;EXECUTION_DATE&apos;</span><span class="p">:</span> <span class="n">date</span><span class="p">})</span>
</pre>
</div>
</div>
<p>Here, <code class="docutils literal notranslate"><span class="pre">{{</span> <span class="pre">ds</span> <span class="pre">}}</span></code> is a macro, and because the <code class="docutils literal notranslate"><span class="pre">env</span></code> parameter of the
<code class="docutils literal notranslate"><span class="pre">BashOperator</span></code> is templated with Jinja, the execution date will be available
as an environment variable named <code class="docutils literal notranslate"><span class="pre">EXECUTION_DATE</span></code> in your Bash script.</p>
<p>You can use Jinja templating with every parameter that is marked as &#x201C;templated&#x201D;
in the documentation. Template substitution occurs just before the pre_execute
function of your operator is called.</p>
</div>
</div>
<div class="section" id="packaged-dags">
<h2 class="sigil_not_in_toc">Packaged dags</h2>
<p>While often you will specify dags in a single <code class="docutils literal notranslate"><span class="pre">.py</span></code> file it might sometimes
be required to combine dag and its dependencies. For example, you might want
to combine several dags together to version them together or you might want
to manage them together or you might need an extra module that is not available
by default on the system you are running airflow on. To allow this you can create
a zip file that contains the dag(s) in the root of the zip file and have the extra
modules unpacked in directories.</p>
<p>For instance you can create a zip file that looks like this:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>my_dag1.py
my_dag2.py
package1/__init__.py
package1/functions.py
</pre>
</div>
</div>
<p>Airflow will scan the zip file and try to load <code class="docutils literal notranslate"><span class="pre">my_dag1.py</span></code> and <code class="docutils literal notranslate"><span class="pre">my_dag2.py</span></code>.
It will not go into subdirectories as these are considered to be potential
packages.</p>
<p>In case you would like to add module dependencies to your DAG you basically would
do the same, but then it is more to use a virtualenv and pip.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>virtualenv zip_dag
<span class="nb">source</span> zip_dag/bin/activate
mkdir zip_dag_contents
<span class="nb">cd</span> zip_dag_contents
pip install --install-option<span class="o">=</span><span class="s2">&quot;--install-lib=</span><span class="nv">$PWD</span><span class="s2">&quot;</span> my_useful_package
cp ~/my_dag.py .
zip -r zip_dag.zip *
</pre>
</div>
</div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">the zip file will be inserted at the beginning of module search list
(sys.path) and as such it will be available to any other code that resides
within the same interpreter.</p>
</div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">packaged dags cannot be used with pickling turned on.</p>
</div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">packaged dags cannot contain dynamic libraries (eg. libz.so) these need
to be available on the system if a module needs those. In other words only
pure python modules can be packaged.</p>
</div>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Data Profiling</h1>
<p>Part of being productive with data is having the right weapons to
profile the data you are working with. Airflow provides a simple query
interface to write SQL and get results quickly, and a charting application
letting you visualize data.</p>
<div class="section" id="adhoc-queries">
<h2 class="sigil_not_in_toc">Adhoc Queries</h2>
<p>The adhoc query UI allows for simple SQL interactions with the database
connections registered in Airflow.</p>
<img alt="https://airflow.apache.org/_images/adhoc.png" src="../img/bfbf60f9689630d6aa1f46aeab1e6cf0.jpg">
</div>
<div class="section" id="charts">
<h2 class="sigil_not_in_toc">Charts</h2>
<p>A simple UI built on top of flask-admin and highcharts allows building
data visualizations and charts easily. Fill in a form with a label, SQL,
chart type, pick a source database from your environment&#x2019;s connections,
select a few other options, and save it for later use.</p>
<p>You can even use the same templating and macros available when writing
airflow pipelines, parameterizing your queries and modifying parameters
directly in the URL.</p>
<p>These charts are basic, but they&#x2019;re easy to create, modify and share.</p>
<div class="section" id="chart-screenshot">
<h3 class="sigil_not_in_toc">Chart Screenshot</h3>
<img alt="https://airflow.apache.org/_images/chart.png" src="../img/a7247daabfaa0606cbb0d05e511194db.jpg">
</div>
<hr class="docutils">
<div class="section" id="chart-form-screenshot">
<h3 class="sigil_not_in_toc">Chart Form Screenshot</h3>
<img alt="https://airflow.apache.org/_images/chart_form.png" src="../img/a40de0ada10bc0250de4b6c082cb7660.jpg">
</div>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Command Line Interface</h1>
<p>Airflow has a very rich command line interface that allows for
many types of operation on a DAG, starting services, and supporting
development and testing.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">usage</span><span class="p">:</span> <span class="n">airflow</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span>
<span class="p">{</span><span class="n">resetdb</span><span class="p">,</span><span class="n">render</span><span class="p">,</span><span class="n">variables</span><span class="p">,</span><span class="n">connections</span><span class="p">,</span><span class="n">create_user</span><span class="p">,</span><span class="n">pause</span><span class="p">,</span><span class="n">task_failed_deps</span><span class="p">,</span><span class="n">version</span><span class="p">,</span><span class="n">trigger_dag</span><span class="p">,</span><span class="n">initdb</span><span class="p">,</span><span class="n">test</span><span class="p">,</span><span class="n">unpause</span><span class="p">,</span><span class="n">dag_state</span><span class="p">,</span><span class="n">run</span><span class="p">,</span><span class="n">list_tasks</span><span class="p">,</span><span class="n">backfill</span><span class="p">,</span><span class="n">list_dags</span><span class="p">,</span><span class="n">kerberos</span><span class="p">,</span><span class="n">worker</span><span class="p">,</span><span class="n">webserver</span><span class="p">,</span><span class="n">flower</span><span class="p">,</span><span class="n">scheduler</span><span class="p">,</span><span class="n">task_state</span><span class="p">,</span><span class="n">pool</span><span class="p">,</span><span class="n">serve_logs</span><span class="p">,</span><span class="n">clear</span><span class="p">,</span><span class="n">upgradedb</span><span class="p">,</span><span class="n">delete_dag</span><span class="p">}</span>
<span class="o">...</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments">
<h2 class="sigil_not_in_toc">Positional Arguments</h2>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>subcommand</kbd></td>
<td><p class="first">Possible choices: resetdb, render, variables, connections, create_user, pause, task_failed_deps, version, trigger_dag, initdb, test, unpause, dag_state, run, list_tasks, backfill, list_dags, kerberos, worker, webserver, flower, scheduler, task_state, pool, serve_logs, clear, upgradedb, delete_dag</p>
<p class="last">sub-command help</p>
</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Sub-commands:">
<h2 class="sigil_not_in_toc">Sub-commands:</h2>
<div class="section" id="resetdb">
<h3 class="sigil_not_in_toc">resetdb</h3>
<p>Burn down and rebuild the metadata database</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">resetdb</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">y</span><span class="p">]</span>
</pre>
</div>
</div>
<div class="section" id="Named Arguments">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-y, --yes</kbd></td>
<td><p class="first">Do not prompt to confirm reset. Use with care!</p>
<p class="last">Default: False</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="render">
<h3 class="sigil_not_in_toc">render</h3>
<p>Render a task instance&#x2019;s template(s)</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">render</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span> <span class="n">dag_id</span> <span class="n">task_id</span> <span class="n">execution_date</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat1">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>dag_id</kbd></td>
<td>The id of the dag</td>
</tr>
<tr><td class="option-group">
<kbd>task_id</kbd></td>
<td>The id of the task</td>
</tr>
<tr><td class="option-group">
<kbd>execution_date</kbd></td>
<td>The execution date of the DAG</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat1">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="variables">
<h3 class="sigil_not_in_toc">variables</h3>
<p>CRUD operations on variables</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">variables</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">s</span> <span class="n">KEY</span> <span class="n">VAL</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">g</span> <span class="n">KEY</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">j</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">d</span> <span class="n">VAL</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">i</span> <span class="n">FILEPATH</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="n">e</span> <span class="n">FILEPATH</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">x</span> <span class="n">KEY</span><span class="p">]</span>
</pre>
</div>
</div>
<div class="section" id="Named Arguments_repeat2">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-s, --set</kbd></td>
<td>Set a variable</td>
</tr>
<tr><td class="option-group">
<kbd>-g, --get</kbd></td>
<td>Get value of a variable</td>
</tr>
<tr><td class="option-group">
<kbd>-j, --json</kbd></td>
<td><p class="first">Deserialize JSON variable</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-d, --default</kbd></td>
<td>Default value returned if variable does not exist</td>
</tr>
<tr><td class="option-group">
<kbd>-i, --import</kbd></td>
<td>Import variables from JSON file</td>
</tr>
<tr><td class="option-group">
<kbd>-e, --export</kbd></td>
<td>Export variables to JSON file</td>
</tr>
<tr><td class="option-group">
<kbd>-x, --delete</kbd></td>
<td>Delete a variable</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="connections">
<h3 class="sigil_not_in_toc">connections</h3>
<p>List/Add/Delete connections</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">connections</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">l</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">a</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">d</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">conn_id</span> <span class="n">CONN_ID</span><span class="p">]</span>
<span class="p">[</span><span class="o">--</span><span class="n">conn_uri</span> <span class="n">CONN_URI</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">conn_extra</span> <span class="n">CONN_EXTRA</span><span class="p">]</span>
<span class="p">[</span><span class="o">--</span><span class="n">conn_type</span> <span class="n">CONN_TYPE</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">conn_host</span> <span class="n">CONN_HOST</span><span class="p">]</span>
<span class="p">[</span><span class="o">--</span><span class="n">conn_login</span> <span class="n">CONN_LOGIN</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">conn_password</span> <span class="n">CONN_PASSWORD</span><span class="p">]</span>
<span class="p">[</span><span class="o">--</span><span class="n">conn_schema</span> <span class="n">CONN_SCHEMA</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">conn_port</span> <span class="n">CONN_PORT</span><span class="p">]</span>
</pre>
</div>
</div>
<div class="section" id="Named Arguments_repeat3">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-l, --list</kbd></td>
<td><p class="first">List all connections</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-a, --add</kbd></td>
<td><p class="first">Add a connection</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-d, --delete</kbd></td>
<td><p class="first">Delete a connection</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>--conn_id</kbd></td>
<td>Connection id, required to add/delete a connection</td>
</tr>
<tr><td class="option-group">
<kbd>--conn_uri</kbd></td>
<td>Connection URI, required to add a connection without conn_type</td>
</tr>
<tr><td class="option-group">
<kbd>--conn_extra</kbd></td>
<td>Connection <cite>Extra</cite> field, optional when adding a connection</td>
</tr>
<tr><td class="option-group">
<kbd>--conn_type</kbd></td>
<td>Connection type, required to add a connection without conn_uri</td>
</tr>
<tr><td class="option-group">
<kbd>--conn_host</kbd></td>
<td>Connection host, optional when adding a connection</td>
</tr>
<tr><td class="option-group">
<kbd>--conn_login</kbd></td>
<td>Connection login, optional when adding a connection</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>--conn_password</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td>Connection password, optional when adding a connection</td>
</tr>
<tr><td class="option-group">
<kbd>--conn_schema</kbd></td>
<td>Connection schema, optional when adding a connection</td>
</tr>
<tr><td class="option-group">
<kbd>--conn_port</kbd></td>
<td>Connection port, optional when adding a connection</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="create_user">
<h3 class="sigil_not_in_toc">create_user</h3>
<p>Create an admin account</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">create_user</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">r</span> <span class="n">ROLE</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">u</span> <span class="n">USERNAME</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">e</span> <span class="n">EMAIL</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">f</span> <span class="n">FIRSTNAME</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="n">l</span> <span class="n">LASTNAME</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">p</span> <span class="n">PASSWORD</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">use_random_password</span><span class="p">]</span>
</pre>
</div>
</div>
<div class="section" id="Named Arguments_repeat4">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-r, --role</kbd></td>
<td>Role of the user. Existing roles include Admin, User, Op, Viewer, and Public</td>
</tr>
<tr><td class="option-group">
<kbd>-u, --username</kbd></td>
<td>Username of the user</td>
</tr>
<tr><td class="option-group">
<kbd>-e, --email</kbd></td>
<td>Email of the user</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-f, --firstname</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td>First name of the user</td>
</tr>
<tr><td class="option-group">
<kbd>-l, --lastname</kbd></td>
<td>Last name of the user</td>
</tr>
<tr><td class="option-group">
<kbd>-p, --password</kbd></td>
<td>Password of the user</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>--use_random_password</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Do not prompt for password. Use random string instead</p>
<p class="last">Default: False</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="pause">
<h3 class="sigil_not_in_toc">pause</h3>
<p>Pause a DAG</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">pause</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span> <span class="n">dag_id</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat2">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>dag_id</kbd></td>
<td>The id of the dag</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat5">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="task_failed_deps">
<h3 class="sigil_not_in_toc">task_failed_deps</h3>
<p>Returns the unmet dependencies for a task instance from the perspective of the scheduler. In other words, why a task instance doesn&#x2019;t get scheduled and then queued by the scheduler, and then run by an executor).</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">task_failed_deps</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span> <span class="n">dag_id</span> <span class="n">task_id</span> <span class="n">execution_date</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat3">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>dag_id</kbd></td>
<td>The id of the dag</td>
</tr>
<tr><td class="option-group">
<kbd>task_id</kbd></td>
<td>The id of the task</td>
</tr>
<tr><td class="option-group">
<kbd>execution_date</kbd></td>
<td>The execution date of the DAG</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat6">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="version">
<h3 class="sigil_not_in_toc">version</h3>
<p>Show the version</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">version</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span>
</pre>
</div>
</div>
</div>
<div class="section" id="trigger_dag">
<h3 class="sigil_not_in_toc">trigger_dag</h3>
<p>Trigger a DAG run</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">trigger_dag</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">r</span> <span class="n">RUN_ID</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">c</span> <span class="n">CONF</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">e</span> <span class="n">EXEC_DATE</span><span class="p">]</span>
<span class="n">dag_id</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat4">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>dag_id</kbd></td>
<td>The id of the dag</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat7">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-r, --run_id</kbd></td>
<td>Helps to identify this run</td>
</tr>
<tr><td class="option-group">
<kbd>-c, --conf</kbd></td>
<td>JSON string that gets pickled into the DagRun&#x2019;s conf attribute</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-e, --exec_date</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td>The execution date of the DAG</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="initdb">
<h3 class="sigil_not_in_toc">initdb</h3>
<p>Initialize the metadata database</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">initdb</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span>
</pre>
</div>
</div>
</div>
<div class="section" id="test">
<h3 class="sigil_not_in_toc">test</h3>
<p>Test a task instance. This will run a task without checking for dependencies or recording its state in the database.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">test</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">dr</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">tp</span> <span class="n">TASK_PARAMS</span><span class="p">]</span>
<span class="n">dag_id</span> <span class="n">task_id</span> <span class="n">execution_date</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat5">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>dag_id</kbd></td>
<td>The id of the dag</td>
</tr>
<tr><td class="option-group">
<kbd>task_id</kbd></td>
<td>The id of the task</td>
</tr>
<tr><td class="option-group">
<kbd>execution_date</kbd></td>
<td>The execution date of the DAG</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat8">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-dr, --dry_run</kbd></td>
<td><p class="first">Perform a dry run</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-tp, --task_params</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td>Sends a JSON params dict to the task</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="unpause">
<h3 class="sigil_not_in_toc">unpause</h3>
<p>Resume a paused DAG</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">unpause</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span> <span class="n">dag_id</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat6">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>dag_id</kbd></td>
<td>The id of the dag</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat9">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="dag_state">
<h3 class="sigil_not_in_toc">dag_state</h3>
<p>Get the status of a dag run</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">dag_state</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span> <span class="n">dag_id</span> <span class="n">execution_date</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat7">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>dag_id</kbd></td>
<td>The id of the dag</td>
</tr>
<tr><td class="option-group">
<kbd>execution_date</kbd></td>
<td>The execution date of the DAG</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat10">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="run">
<h3 class="sigil_not_in_toc">run</h3>
<p>Run a single task instance</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">run</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">m</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">f</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">pool</span> <span class="n">POOL</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">cfg_path</span> <span class="n">CFG_PATH</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="n">l</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">A</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">i</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">I</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">ship_dag</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">p</span> <span class="n">PICKLE</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="nb">int</span><span class="p">]</span>
<span class="n">dag_id</span> <span class="n">task_id</span> <span class="n">execution_date</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat8">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>dag_id</kbd></td>
<td>The id of the dag</td>
</tr>
<tr><td class="option-group">
<kbd>task_id</kbd></td>
<td>The id of the task</td>
</tr>
<tr><td class="option-group">
<kbd>execution_date</kbd></td>
<td>The execution date of the DAG</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat11">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-m, --mark_success</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Mark jobs as succeeded without running them</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-f, --force</kbd></td>
<td><p class="first">Ignore previous task instance state, rerun regardless if task already succeeded/failed</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>--pool</kbd></td>
<td>Resource pool to use</td>
</tr>
<tr><td class="option-group">
<kbd>--cfg_path</kbd></td>
<td>Path to config file to use instead of airflow.cfg</td>
</tr>
<tr><td class="option-group">
<kbd>-l, --local</kbd></td>
<td><p class="first">Run the task using the LocalExecutor</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-A, --ignore_all_dependencies</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Ignores all non-critical dependencies, including ignore_ti_state and ignore_task_deps</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-i, --ignore_dependencies</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Ignore task-specific dependencies, e.g. upstream, depends_on_past, and retry delay dependencies</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-I, --ignore_depends_on_past</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Ignore depends_on_past dependencies (but respect upstream dependencies)</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>--ship_dag</kbd></td>
<td><p class="first">Pickles (serializes) the DAG and ships it to the worker</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-p, --pickle</kbd></td>
<td>Serialized pickle object of the entire dag (used internally)</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-int, --interactive</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Do not capture standard output and error streams (useful for interactive debugging)</p>
<p class="last">Default: False</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="list_tasks">
<h3 class="sigil_not_in_toc">list_tasks</h3>
<p>List the tasks within a DAG</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">list_tasks</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">t</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span> <span class="n">dag_id</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat9">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>dag_id</kbd></td>
<td>The id of the dag</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat12">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-t, --tree</kbd></td>
<td><p class="first">Tree view</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="backfill">
<h3 class="sigil_not_in_toc">backfill</h3>
<p>Run subsections of a DAG for a specified date range. If reset_dag_run option is used, backfill will first prompt users whether airflow should clear all the previous dag_run and task_instances within the backfill date range.If rerun_failed_tasks is used, backfill will auto re-run the previous failed task instances within the backfill date range.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">backfill</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">t</span> <span class="n">TASK_REGEX</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">s</span> <span class="n">START_DATE</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">e</span> <span class="n">END_DATE</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">m</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">l</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="n">x</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">i</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">I</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">pool</span> <span class="n">POOL</span><span class="p">]</span>
<span class="p">[</span><span class="o">--</span><span class="n">delay_on_limit</span> <span class="n">DELAY_ON_LIMIT</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">dr</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">v</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">c</span> <span class="n">CONF</span><span class="p">]</span>
<span class="p">[</span><span class="o">--</span><span class="n">reset_dagruns</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">rerun_failed_tasks</span><span class="p">]</span>
<span class="n">dag_id</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat10">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>dag_id</kbd></td>
<td>The id of the dag</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat13">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group" colspan="2">
<kbd>-t, --task_regex</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td>The regex to filter specific task_ids to backfill (optional)</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-s, --start_date</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td>Override start_date YYYY-MM-DD</td>
</tr>
<tr><td class="option-group">
<kbd>-e, --end_date</kbd></td>
<td>Override end_date YYYY-MM-DD</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-m, --mark_success</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Mark jobs as succeeded without running them</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-l, --local</kbd></td>
<td><p class="first">Run the task using the LocalExecutor</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-x, --donot_pickle</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Do not attempt to pickle the DAG object to send over to the workers, just tell the workers to run their version of the code.</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-i, --ignore_dependencies</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Skip upstream tasks, run only the tasks matching the regexp. Only works in conjunction with task_regex</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-I, --ignore_first_depends_on_past</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Ignores depends_on_past dependencies for the first set of tasks only (subsequent executions in the backfill DO respect depends_on_past).</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>--pool</kbd></td>
<td>Resource pool to use</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>--delay_on_limit</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Amount of time in seconds to wait when the limit on maximum active dag runs (max_active_runs) has been reached before trying to execute a dag run again.</p>
<p class="last">Default: 1.0</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-dr, --dry_run</kbd></td>
<td><p class="first">Perform a dry run</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-v, --verbose</kbd></td>
<td><p class="first">Make logging output more verbose</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-c, --conf</kbd></td>
<td>JSON string that gets pickled into the DagRun&#x2019;s conf attribute</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>--reset_dagruns</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">if set, the backfill will delete existing backfill-related DAG runs and start anew with fresh, running DAG runs</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>--rerun_failed_tasks</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">if set, the backfill will auto-rerun all the failed tasks for the backfill date range instead of throwing exceptions</p>
<p class="last">Default: False</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="list_dags">
<h3 class="sigil_not_in_toc">list_dags</h3>
<p>List all the DAGs</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">list_dags</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">r</span><span class="p">]</span>
</pre>
</div>
</div>
<div class="section" id="Named Arguments_repeat14">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-r, --report</kbd></td>
<td><p class="first">Show DagBag loading report</p>
<p class="last">Default: False</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="kerberos">
<h3 class="sigil_not_in_toc">kerberos</h3>
<p>Start a kerberos ticket renewer</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">kerberos</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">kt</span> <span class="p">[</span><span class="n">KEYTAB</span><span class="p">]]</span> <span class="p">[</span><span class="o">--</span><span class="n">pid</span> <span class="p">[</span><span class="n">PID</span><span class="p">]]</span> <span class="p">[</span><span class="o">-</span><span class="n">D</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">stdout</span> <span class="n">STDOUT</span><span class="p">]</span>
<span class="p">[</span><span class="o">--</span><span class="n">stderr</span> <span class="n">STDERR</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">l</span> <span class="n">LOG_FILE</span><span class="p">]</span>
<span class="p">[</span><span class="n">principal</span><span class="p">]</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat11">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>principal</kbd></td>
<td><p class="first">kerberos principal</p>
<p class="last">Default: airflow</p>
</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat15">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-kt, --keytab</kbd></td>
<td><p class="first">keytab</p>
<p class="last">Default: airflow.keytab</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>--pid</kbd></td>
<td>PID file location</td>
</tr>
<tr><td class="option-group">
<kbd>-D, --daemon</kbd></td>
<td><p class="first">Daemonize instead of running in the foreground</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>--stdout</kbd></td>
<td>Redirect stdout to this file</td>
</tr>
<tr><td class="option-group">
<kbd>--stderr</kbd></td>
<td>Redirect stderr to this file</td>
</tr>
<tr><td class="option-group">
<kbd>-l, --log-file</kbd></td>
<td>Location of the log file</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="worker">
<h3 class="sigil_not_in_toc">worker</h3>
<p>Start a Celery worker node</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">worker</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">p</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">q</span> <span class="n">QUEUES</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">c</span> <span class="n">CONCURRENCY</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">cn</span> <span class="n">CELERY_HOSTNAME</span><span class="p">]</span>
<span class="p">[</span><span class="o">--</span><span class="n">pid</span> <span class="p">[</span><span class="n">PID</span><span class="p">]]</span> <span class="p">[</span><span class="o">-</span><span class="n">D</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">stdout</span> <span class="n">STDOUT</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">stderr</span> <span class="n">STDERR</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="n">l</span> <span class="n">LOG_FILE</span><span class="p">]</span>
</pre>
</div>
</div>
<div class="section" id="Named Arguments_repeat16">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group" colspan="2">
<kbd>-p, --do_pickle</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Attempt to pickle the DAG object to send over to the workers, instead of letting workers run their version of the code.</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-q, --queues</kbd></td>
<td><p class="first">Comma delimited list of queues to serve</p>
<p class="last">Default: default</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-c, --concurrency</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">The number of worker processes</p>
<p class="last">Default: 16</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-cn, --celery_hostname</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td>Set the hostname of celery worker if you have multiple workers on a single machine.</td>
</tr>
<tr><td class="option-group">
<kbd>--pid</kbd></td>
<td>PID file location</td>
</tr>
<tr><td class="option-group">
<kbd>-D, --daemon</kbd></td>
<td><p class="first">Daemonize instead of running in the foreground</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>--stdout</kbd></td>
<td>Redirect stdout to this file</td>
</tr>
<tr><td class="option-group">
<kbd>--stderr</kbd></td>
<td>Redirect stderr to this file</td>
</tr>
<tr><td class="option-group">
<kbd>-l, --log-file</kbd></td>
<td>Location of the log file</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="webserver">
<h3 class="sigil_not_in_toc">webserver</h3>
<p>Start a Airflow webserver instance</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">webserver</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">p</span> <span class="n">PORT</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">w</span> <span class="n">WORKERS</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="n">k</span> <span class="p">{</span><span class="n">sync</span><span class="p">,</span><span class="n">eventlet</span><span class="p">,</span><span class="n">gevent</span><span class="p">,</span><span class="n">tornado</span><span class="p">}]</span> <span class="p">[</span><span class="o">-</span><span class="n">t</span> <span class="n">WORKER_TIMEOUT</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="n">hn</span> <span class="n">HOSTNAME</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">pid</span> <span class="p">[</span><span class="n">PID</span><span class="p">]]</span> <span class="p">[</span><span class="o">-</span><span class="n">D</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">stdout</span> <span class="n">STDOUT</span><span class="p">]</span>
<span class="p">[</span><span class="o">--</span><span class="n">stderr</span> <span class="n">STDERR</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">A</span> <span class="n">ACCESS_LOGFILE</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">E</span> <span class="n">ERROR_LOGFILE</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="n">l</span> <span class="n">LOG_FILE</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">ssl_cert</span> <span class="n">SSL_CERT</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">ssl_key</span> <span class="n">SSL_KEY</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">d</span><span class="p">]</span>
</pre>
</div>
</div>
<div class="section" id="Named Arguments_repeat17">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-p, --port</kbd></td>
<td><p class="first">The port on which to run the server</p>
<p class="last">Default: 8080</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-w, --workers</kbd></td>
<td><p class="first">Number of workers to run the webserver on</p>
<p class="last">Default: 4</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-k, --workerclass</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Possible choices: sync, eventlet, gevent, tornado</p>
<p>The worker class to use for Gunicorn</p>
<p class="last">Default: sync</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-t, --worker_timeout</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">The timeout for waiting on webserver workers</p>
<p class="last">Default: 120</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-hn, --hostname</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Set the hostname on which to run the web server</p>
<p class="last">Default: 0.0.0.0</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>--pid</kbd></td>
<td>PID file location</td>
</tr>
<tr><td class="option-group">
<kbd>-D, --daemon</kbd></td>
<td><p class="first">Daemonize instead of running in the foreground</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>--stdout</kbd></td>
<td>Redirect stdout to this file</td>
</tr>
<tr><td class="option-group">
<kbd>--stderr</kbd></td>
<td>Redirect stderr to this file</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-A, --access_logfile</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">The logfile to store the webserver access log. Use &#x2018;-&#x2018; to print to stderr.</p>
<p class="last">Default: -</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-E, --error_logfile</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">The logfile to store the webserver error log. Use &#x2018;-&#x2018; to print to stderr.</p>
<p class="last">Default: -</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-l, --log-file</kbd></td>
<td>Location of the log file</td>
</tr>
<tr><td class="option-group">
<kbd>--ssl_cert</kbd></td>
<td>Path to the SSL certificate for the webserver</td>
</tr>
<tr><td class="option-group">
<kbd>--ssl_key</kbd></td>
<td>Path to the key to use with the SSL certificate</td>
</tr>
<tr><td class="option-group">
<kbd>-d, --debug</kbd></td>
<td><p class="first">Use the server that ships with Flask in debug mode</p>
<p class="last">Default: False</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="flower">
<h3 class="sigil_not_in_toc">flower</h3>
<p>Start a Celery Flower</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">flower</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">hn</span> <span class="n">HOSTNAME</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">p</span> <span class="n">PORT</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">fc</span> <span class="n">FLOWER_CONF</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">u</span> <span class="n">URL_PREFIX</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="n">a</span> <span class="n">BROKER_API</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">pid</span> <span class="p">[</span><span class="n">PID</span><span class="p">]]</span> <span class="p">[</span><span class="o">-</span><span class="n">D</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">stdout</span> <span class="n">STDOUT</span><span class="p">]</span>
<span class="p">[</span><span class="o">--</span><span class="n">stderr</span> <span class="n">STDERR</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">l</span> <span class="n">LOG_FILE</span><span class="p">]</span>
</pre>
</div>
</div>
<div class="section" id="Named Arguments_repeat18">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group" colspan="2">
<kbd>-hn, --hostname</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Set the hostname on which to run the server</p>
<p class="last">Default: 0.0.0.0</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-p, --port</kbd></td>
<td><p class="first">The port on which to run the server</p>
<p class="last">Default: 5555</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-fc, --flower_conf</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td>Configuration file for flower</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-u, --url_prefix</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td>URL prefix for Flower</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-a, --broker_api</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td>Broker api</td>
</tr>
<tr><td class="option-group">
<kbd>--pid</kbd></td>
<td>PID file location</td>
</tr>
<tr><td class="option-group">
<kbd>-D, --daemon</kbd></td>
<td><p class="first">Daemonize instead of running in the foreground</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>--stdout</kbd></td>
<td>Redirect stdout to this file</td>
</tr>
<tr><td class="option-group">
<kbd>--stderr</kbd></td>
<td>Redirect stderr to this file</td>
</tr>
<tr><td class="option-group">
<kbd>-l, --log-file</kbd></td>
<td>Location of the log file</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="scheduler">
<h3 class="sigil_not_in_toc">scheduler</h3>
<p>Start a scheduler instance</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">scheduler</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">d</span> <span class="n">DAG_ID</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">r</span> <span class="n">RUN_DURATION</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="n">n</span> <span class="n">NUM_RUNS</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">p</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">pid</span> <span class="p">[</span><span class="n">PID</span><span class="p">]]</span> <span class="p">[</span><span class="o">-</span><span class="n">D</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">stdout</span> <span class="n">STDOUT</span><span class="p">]</span>
<span class="p">[</span><span class="o">--</span><span class="n">stderr</span> <span class="n">STDERR</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">l</span> <span class="n">LOG_FILE</span><span class="p">]</span>
</pre>
</div>
</div>
<div class="section" id="Named Arguments_repeat19">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-d, --dag_id</kbd></td>
<td>The id of the dag to run</td>
</tr>
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-r, --run-duration</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td>Set number of seconds to execute before exiting</td>
</tr>
<tr><td class="option-group">
<kbd>-n, --num_runs</kbd></td>
<td><p class="first">Set the number of runs to execute before exiting</p>
<p class="last">Default: -1</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-p, --do_pickle</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Attempt to pickle the DAG object to send over to the workers, instead of letting workers run their version of the code.</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>--pid</kbd></td>
<td>PID file location</td>
</tr>
<tr><td class="option-group">
<kbd>-D, --daemon</kbd></td>
<td><p class="first">Daemonize instead of running in the foreground</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>--stdout</kbd></td>
<td>Redirect stdout to this file</td>
</tr>
<tr><td class="option-group">
<kbd>--stderr</kbd></td>
<td>Redirect stderr to this file</td>
</tr>
<tr><td class="option-group">
<kbd>-l, --log-file</kbd></td>
<td>Location of the log file</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="task_state">
<h3 class="sigil_not_in_toc">task_state</h3>
<p>Get the status of a task instance</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">task_state</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span> <span class="n">dag_id</span> <span class="n">task_id</span> <span class="n">execution_date</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat12">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>dag_id</kbd></td>
<td>The id of the dag</td>
</tr>
<tr><td class="option-group">
<kbd>task_id</kbd></td>
<td>The id of the task</td>
</tr>
<tr><td class="option-group">
<kbd>execution_date</kbd></td>
<td>The execution date of the DAG</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat20">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="pool">
<h3 class="sigil_not_in_toc">pool</h3>
<p>CRUD operations on pools</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">pool</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">s</span> <span class="n">NAME</span> <span class="n">SLOT_COUNT</span> <span class="n">POOL_DESCRIPTION</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">g</span> <span class="n">NAME</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">x</span> <span class="n">NAME</span><span class="p">]</span>
</pre>
</div>
</div>
<div class="section" id="Named Arguments_repeat21">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-s, --set</kbd></td>
<td>Set pool slot count and description, respectively</td>
</tr>
<tr><td class="option-group">
<kbd>-g, --get</kbd></td>
<td>Get pool info</td>
</tr>
<tr><td class="option-group">
<kbd>-x, --delete</kbd></td>
<td>Delete a pool</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="serve_logs">
<h3 class="sigil_not_in_toc">serve_logs</h3>
<p>Serve logs generate by worker</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">serve_logs</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span>
</pre>
</div>
</div>
</div>
<div class="section" id="clear">
<h3 class="sigil_not_in_toc">clear</h3>
<p>Clear a set of task instance, as if they never ran</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">clear</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">t</span> <span class="n">TASK_REGEX</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">s</span> <span class="n">START_DATE</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">e</span> <span class="n">END_DATE</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">sd</span> <span class="n">SUBDIR</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="n">u</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">d</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">c</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">f</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">r</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">x</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">dx</span><span class="p">]</span>
<span class="n">dag_id</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat13">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>dag_id</kbd></td>
<td>The id of the dag</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat22">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group" colspan="2">
<kbd>-t, --task_regex</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td>The regex to filter specific task_ids to backfill (optional)</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-s, --start_date</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td>Override start_date YYYY-MM-DD</td>
</tr>
<tr><td class="option-group">
<kbd>-e, --end_date</kbd></td>
<td>Override end_date YYYY-MM-DD</td>
</tr>
<tr><td class="option-group">
<kbd>-sd, --subdir</kbd></td>
<td><p class="first">File location or directory from which to look for the dag</p>
<p class="last">Default: /Users/kaxil/airflow/dags</p>
</td>
</tr>
<tr><td class="option-group">
<kbd>-u, --upstream</kbd></td>
<td><p class="first">Include upstream tasks</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-d, --downstream</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Include downstream tasks</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-c, --no_confirm</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Do not request confirmation</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-f, --only_failed</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Only failed jobs</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-r, --only_running</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Only running jobs</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-x, --exclude_subdags</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Exclude subdags</p>
<p class="last">Default: False</p>
</td>
</tr>
<tr><td class="option-group" colspan="2">
<kbd>-dx, --dag_regex</kbd></td>
</tr>
<tr><td>&#xA0;</td>
<td><p class="first">Search dag_id as regex instead of exact string</p>
<p class="last">Default: False</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="upgradedb">
<h3 class="sigil_not_in_toc">upgradedb</h3>
<p>Upgrade the metadata database to latest version</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">upgradedb</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span>
</pre>
</div>
</div>
</div>
<div class="section" id="delete_dag">
<h3 class="sigil_not_in_toc">delete_dag</h3>
<p>Delete all DB records related to the specified DAG</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">delete_dag</span> <span class="p">[</span><span class="o">-</span><span class="n">h</span><span class="p">]</span> <span class="p">[</span><span class="o">-</span><span class="n">y</span><span class="p">]</span> <span class="n">dag_id</span>
</pre>
</div>
</div>
<div class="section" id="Positional Arguments_repeat14">
<h4 class="sigil_not_in_toc">Positional Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>dag_id</kbd></td>
<td>The id of the dag</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="Named Arguments_repeat23">
<h4 class="sigil_not_in_toc">Named Arguments</h4>
<table class="docutils option-list" frame="void" rules="none">
<colgroup><col class="option">
<col class="description">
</colgroup>
<tbody valign="top">
<tr><td class="option-group">
<kbd>-y, --yes</kbd></td>
<td><p class="first">Do not prompt to confirm reset. Use with care!</p>
<p class="last">Default: False</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Scheduling &amp; Triggers</h1>
<p>The Airflow scheduler monitors all tasks and all DAGs, and triggers the
task instances whose dependencies have been met. Behind the scenes,
it monitors and stays in sync with a folder for all DAG objects it may contain,
and periodically (every minute or so) inspects active tasks to see whether
they can be triggered.</p>
<p>The Airflow scheduler is designed to run as a persistent service in an
Airflow production environment. To kick it off, all you need to do is
execute <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">scheduler</span></code>. It will use the configuration specified in
<code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>.</p>
<p>Note that if you run a DAG on a <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code> of one day,
the run stamped <code class="docutils literal notranslate"><span class="pre">2016-01-01</span></code> will be triggered soon after <code class="docutils literal notranslate"><span class="pre">2016-01-01T23:59</span></code>.
In other words, the job instance is started once the period it covers
has ended.</p>
<p><strong>Let&#x2019;s Repeat That</strong> The scheduler runs your job one <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code> AFTER the
start date, at the END of the period.</p>
<p>The scheduler starts an instance of the executor specified in the your
<code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>. If it happens to be the <code class="docutils literal notranslate"><span class="pre">LocalExecutor</span></code>, tasks will be
executed as subprocesses; in the case of <code class="docutils literal notranslate"><span class="pre">CeleryExecutor</span></code> and
<code class="docutils literal notranslate"><span class="pre">MesosExecutor</span></code>, tasks are executed remotely.</p>
<p>To start a scheduler, simply run the command:</p>
<div class="code bash highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">airflow</span> <span class="n">scheduler</span>
</pre>
</div>
</div>
<div class="section" id="dag-runs">
<h2 class="sigil_not_in_toc">DAG Runs</h2>
<p>A DAG Run is an object representing an instantiation of the DAG in time.</p>
<p>Each DAG may or may not have a schedule, which informs how <code class="docutils literal notranslate"><span class="pre">DAG</span> <span class="pre">Runs</span></code> are
created. <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code> is defined as a DAG arguments, and receives
preferably a
<a class="reference external" href="https://en.wikipedia.org/wiki/Cron#CRON_expression">cron expression</a> as
a <code class="docutils literal notranslate"><span class="pre">str</span></code>, or a <code class="docutils literal notranslate"><span class="pre">datetime.timedelta</span></code> object. Alternatively, you can also
use one of these cron &#x201C;preset&#x201D;:</p>
<table border="1" class="docutils">
<colgroup>
<col width="15%">
<col width="69%">
<col width="16%">
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">preset</th>
<th class="head">meaning</th>
<th class="head">cron</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">None</span></code></td>
<td>Don&#x2019;t schedule, use for exclusively &#x201C;externally triggered&#x201D;
DAGs</td>
<td>&#xA0;</td>
</tr>
<tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">@once</span></code></td>
<td>Schedule once and only once</td>
<td>&#xA0;</td>
</tr>
<tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">@hourly</span></code></td>
<td>Run once an hour at the beginning of the hour</td>
<td><code class="docutils literal notranslate"><span class="pre">0</span> <span class="pre">*</span> <span class="pre">*</span> <span class="pre">*</span> <span class="pre">*</span></code></td>
</tr>
<tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">@daily</span></code></td>
<td>Run once a day at midnight</td>
<td><code class="docutils literal notranslate"><span class="pre">0</span> <span class="pre">0</span> <span class="pre">*</span> <span class="pre">*</span> <span class="pre">*</span></code></td>
</tr>
<tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">@weekly</span></code></td>
<td>Run once a week at midnight on Sunday morning</td>
<td><code class="docutils literal notranslate"><span class="pre">0</span> <span class="pre">0</span> <span class="pre">*</span> <span class="pre">*</span> <span class="pre">0</span></code></td>
</tr>
<tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">@monthly</span></code></td>
<td>Run once a month at midnight of the first day of the month</td>
<td><code class="docutils literal notranslate"><span class="pre">0</span> <span class="pre">0</span> <span class="pre">1</span> <span class="pre">*</span> <span class="pre">*</span></code></td>
</tr>
<tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">@yearly</span></code></td>
<td>Run once a year at midnight of January 1</td>
<td><code class="docutils literal notranslate"><span class="pre">0</span> <span class="pre">0</span> <span class="pre">1</span> <span class="pre">1</span> <span class="pre">*</span></code></td>
</tr>
</tbody>
</table>
<p>Your DAG will be instantiated
for each schedule, while creating a <code class="docutils literal notranslate"><span class="pre">DAG</span> <span class="pre">Run</span></code> entry for each schedule.</p>
<p>DAG runs have a state associated to them (running, failed, success) and
informs the scheduler on which set of schedules should be evaluated for
task submissions. Without the metadata at the DAG run level, the Airflow
scheduler would have much more work to do in order to figure out what tasks
should be triggered and come to a crawl. It might also create undesired
processing when changing the shape of your DAG, by say adding in new
tasks.</p>
</div>
<div class="section" id="backfill-and-catchup">
<h2 class="sigil_not_in_toc">Backfill and Catchup</h2>
<p>An Airflow DAG with a <code class="docutils literal notranslate"><span class="pre">start_date</span></code>, possibly an <code class="docutils literal notranslate"><span class="pre">end_date</span></code>, and a <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code> defines a
series of intervals which the scheduler turn into individual Dag Runs and execute. A key capability of
Airflow is that these DAG Runs are atomic, idempotent items, and the scheduler, by default, will examine
the lifetime of the DAG (from start to end/now, one interval at a time) and kick off a DAG Run for any
interval that has not been run (or has been cleared). This concept is called Catchup.</p>
<p>If your DAG is written to handle its own catchup (IE not limited to the interval, but instead to &#x201C;Now&#x201D;
for instance.), then you will want to turn catchup off (Either on the DAG itself with <code class="docutils literal notranslate"><span class="pre">dag.catchup</span> <span class="pre">=</span>
<span class="pre">False</span></code>) or by default at the configuration file level with <code class="docutils literal notranslate"><span class="pre">catchup_by_default</span> <span class="pre">=</span> <span class="pre">False</span></code>. What this
will do, is to instruct the scheduler to only create a DAG Run for the most current instance of the DAG
interval series.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd">Code that goes along with the Airflow tutorial located at:</span>
<span class="sd">https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="kn">from</span> <span class="nn">airflow</span> <span class="k">import</span> <span class="n">DAG</span>
<span class="kn">from</span> <span class="nn">airflow.operators.bash_operator</span> <span class="k">import</span> <span class="n">BashOperator</span>
<span class="kn">from</span> <span class="nn">datetime</span> <span class="k">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span>
<span class="n">default_args</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">&apos;owner&apos;</span><span class="p">:</span> <span class="s1">&apos;airflow&apos;</span><span class="p">,</span>
<span class="s1">&apos;depends_on_past&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">&apos;start_date&apos;</span><span class="p">:</span> <span class="n">datetime</span><span class="p">(</span><span class="mi">2015</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="s1">&apos;email&apos;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&apos;airflow@example.com&apos;</span><span class="p">],</span>
<span class="s1">&apos;email_on_failure&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">&apos;email_on_retry&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">&apos;retries&apos;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">&apos;retry_delay&apos;</span><span class="p">:</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">5</span><span class="p">),</span>
<span class="s1">&apos;schedule_interval&apos;</span><span class="p">:</span> <span class="s1">&apos;@hourly&apos;</span><span class="p">,</span>
<span class="p">}</span>
<span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="s1">&apos;tutorial&apos;</span><span class="p">,</span> <span class="n">catchup</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">default_args</span><span class="o">=</span><span class="n">default_args</span><span class="p">)</span>
</pre>
</div>
</div>
<p>In the example above, if the DAG is picked up by the scheduler daemon on 2016-01-02 at 6 AM, (or from the
command line), a single DAG Run will be created, with an <code class="docutils literal notranslate"><span class="pre">execution_date</span></code> of 2016-01-01, and the next
one will be created just after midnight on the morning of 2016-01-03 with an execution date of 2016-01-02.</p>
<p>If the <code class="docutils literal notranslate"><span class="pre">dag.catchup</span></code> value had been True instead, the scheduler would have created a DAG Run for each
completed interval between 2015-12-01 and 2016-01-02 (but not yet one for 2016-01-02, as that interval
hasn&#x2019;t completed) and the scheduler will execute them sequentially. This behavior is great for atomic
datasets that can easily be split into periods. Turning catchup off is great if your DAG Runs perform
backfill internally.</p>
</div>
<div class="section" id="external-triggers">
<h2 class="sigil_not_in_toc">External Triggers</h2>
<p>Note that <code class="docutils literal notranslate"><span class="pre">DAG</span> <span class="pre">Runs</span></code> can also be created manually through the CLI while
running an <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">trigger_dag</span></code> command, where you can define a
specific <code class="docutils literal notranslate"><span class="pre">run_id</span></code>. The <code class="docutils literal notranslate"><span class="pre">DAG</span> <span class="pre">Runs</span></code> created externally to the
scheduler get associated to the trigger&#x2019;s timestamp, and will be displayed
in the UI alongside scheduled <code class="docutils literal notranslate"><span class="pre">DAG</span> <span class="pre">runs</span></code>.</p>
<p>In addition, you can also manually trigger a <code class="docutils literal notranslate"><span class="pre">DAG</span> <span class="pre">Run</span></code> using the web UI (tab &#x201C;DAGs&#x201D; -&gt; column &#x201C;Links&#x201D; -&gt; button &#x201C;Trigger Dag&#x201D;).</p>
</div>
<div class="section" id="to-keep-in-mind">
<h2 class="sigil_not_in_toc">To Keep in Mind</h2>
<ul class="simple">
<li>The first <code class="docutils literal notranslate"><span class="pre">DAG</span> <span class="pre">Run</span></code> is created based on the minimum <code class="docutils literal notranslate"><span class="pre">start_date</span></code> for the
tasks in your DAG.</li>
<li>Subsequent <code class="docutils literal notranslate"><span class="pre">DAG</span> <span class="pre">Runs</span></code> are created by the scheduler process, based on
your DAG&#x2019;s <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code>, sequentially.</li>
<li>When clearing a set of tasks&#x2019; state in hope of getting them to re-run,
it is important to keep in mind the <code class="docutils literal notranslate"><span class="pre">DAG</span> <span class="pre">Run</span></code>&#x2019;s state too as it defines
whether the scheduler should look into triggering tasks for that run.</li>
</ul>
<p>Here are some of the ways you can <strong>unblock tasks</strong>:</p>
<ul class="simple">
<li>From the UI, you can <strong>clear</strong> (as in delete the status of) individual task instances
from the task instances dialog, while defining whether you want to includes the past/future
and the upstream/downstream dependencies. Note that a confirmation window comes next and
allows you to see the set you are about to clear. You can also clear all task instances
associated with the dag.</li>
<li>The CLI command <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">clear</span> <span class="pre">-h</span></code> has lots of options when it comes to clearing task instance
states, including specifying date ranges, targeting task_ids by specifying a regular expression,
flags for including upstream and downstream relatives, and targeting task instances in specific
states (<code class="docutils literal notranslate"><span class="pre">failed</span></code>, or <code class="docutils literal notranslate"><span class="pre">success</span></code>)</li>
<li>Clearing a task instance will no longer delete the task instance record. Instead it updates
max_tries and set the current task instance state to be None.</li>
<li>Marking task instances as failed can be done through the UI. This can be used to stop running task instances.</li>
<li>Marking task instances as successful can be done through the UI. This is mostly to fix false negatives,
or for instance when the fix has been applied outside of Airflow.</li>
<li>The <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">backfill</span></code> CLI subcommand has a flag to <code class="docutils literal notranslate"><span class="pre">--mark_success</span></code> and allows selecting
subsections of the DAG as well as specifying date ranges.</li>
</ul>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Plugins</h1>
<p>Airflow has a simple plugin manager built-in that can integrate external
features to its core by simply dropping files in your
<code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME/plugins</span></code> folder.</p>
<p>The python modules in the <code class="docutils literal notranslate"><span class="pre">plugins</span></code> folder get imported,
and <strong>hooks</strong>, <strong>operators</strong>, <strong>sensors</strong>, <strong>macros</strong>, <strong>executors</strong> and web <strong>views</strong>
get integrated to Airflow&#x2019;s main collections and become available for use.</p>
<div class="section" id="what-for">
<h2 class="sigil_not_in_toc">What for?</h2>
<p>Airflow offers a generic toolbox for working with data. Different
organizations have different stacks and different needs. Using Airflow
plugins can be a way for companies to customize their Airflow installation
to reflect their ecosystem.</p>
<p>Plugins can be used as an easy way to write, share and activate new sets of
features.</p>
<p>There&#x2019;s also a need for a set of more complex applications to interact with
different flavors of data and metadata.</p>
<p>Examples:</p>
<ul class="simple">
<li>A set of tools to parse Hive logs and expose Hive metadata (CPU /IO / phases/ skew /&#x2026;)</li>
<li>An anomaly detection framework, allowing people to collect metrics, set thresholds and alerts</li>
<li>An auditing tool, helping understand who accesses what</li>
<li>A config-driven SLA monitoring tool, allowing you to set monitored tables and at what time
they should land, alert people, and expose visualizations of outages</li>
<li>&#x2026;</li>
</ul>
</div>
<div class="section" id="why-build-on-top-of-airflow">
<h2 class="sigil_not_in_toc">Why build on top of Airflow?</h2>
<p>Airflow has many components that can be reused when building an application:</p>
<ul class="simple">
<li>A web server you can use to render your views</li>
<li>A metadata database to store your models</li>
<li>Access to your databases, and knowledge of how to connect to them</li>
<li>An array of workers that your application can push workload to</li>
<li>Airflow is deployed, you can just piggy back on its deployment logistics</li>
<li>Basic charting capabilities, underlying libraries and abstractions</li>
</ul>
</div>
<div class="section" id="interface">
<h2 class="sigil_not_in_toc">Interface</h2>
<p>To create a plugin you will need to derive the
<code class="docutils literal notranslate"><span class="pre">airflow.plugins_manager.AirflowPlugin</span></code> class and reference the objects
you want to plug into Airflow. Here&#x2019;s what the class you need to derive
looks like:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">AirflowPlugin</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="c1"># The name of your plugin (str)</span>
<span class="n">name</span> <span class="o">=</span> <span class="kc">None</span>
<span class="c1"># A list of class(es) derived from BaseOperator</span>
<span class="n">operators</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c1"># A list of class(es) derived from BaseSensorOperator</span>
<span class="n">sensors</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c1"># A list of class(es) derived from BaseHook</span>
<span class="n">hooks</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c1"># A list of class(es) derived from BaseExecutor</span>
<span class="n">executors</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c1"># A list of references to inject into the macros namespace</span>
<span class="n">macros</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c1"># A list of objects created from a class derived</span>
<span class="c1"># from flask_admin.BaseView</span>
<span class="n">admin_views</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c1"># A list of Blueprint object created from flask.Blueprint</span>
<span class="n">flask_blueprints</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c1"># A list of menu links (flask_admin.base.MenuLink)</span>
<span class="n">menu_links</span> <span class="o">=</span> <span class="p">[]</span>
</pre>
</div>
</div>
<p>You can derive it by inheritance (please refer to the example below).
Please note <code class="docutils literal notranslate"><span class="pre">name</span></code> inside this class must be specified.</p>
<p>After the plugin is imported into Airflow,
you can invoke it using statement like</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">airflow.</span><span class="p">{</span><span class="nb">type</span><span class="p">,</span> <span class="n">like</span> <span class="s2">&quot;operators&quot;</span><span class="p">,</span> <span class="s2">&quot;sensors&quot;</span><span class="p">}</span><span class="o">.</span><span class="p">{</span><span class="n">name</span> <span class="n">specificed</span> <span class="n">inside</span> <span class="n">the</span> <span class="n">plugin</span> <span class="n">class</span><span class="p">}</span> <span class="kn">import</span> <span class="o">*</span>
</pre>
</div>
</div>
<p>When you write your own plugins, make sure you understand them well.
There are some essential properties for each type of plugin.
For example,</p>
<ul class="simple">
<li>For <code class="docutils literal notranslate"><span class="pre">Operator</span></code> plugin, an <code class="docutils literal notranslate"><span class="pre">execute</span></code> method is compulsory.</li>
<li>For <code class="docutils literal notranslate"><span class="pre">Sensor</span></code> plugin, a <code class="docutils literal notranslate"><span class="pre">poke</span></code> method returning a Boolean value is compulsory.</li>
</ul>
</div>
<div class="section" id="example">
<h2 class="sigil_not_in_toc">Example</h2>
<p>The code below defines a plugin that injects a set of dummy object
definitions in Airflow.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># This is the class you derive to create a plugin</span>
<span class="kn">from</span> <span class="nn">airflow.plugins_manager</span> <span class="k">import</span> <span class="n">AirflowPlugin</span>
<span class="kn">from</span> <span class="nn">flask</span> <span class="k">import</span> <span class="n">Blueprint</span>
<span class="kn">from</span> <span class="nn">flask_admin</span> <span class="k">import</span> <span class="n">BaseView</span><span class="p">,</span> <span class="n">expose</span>
<span class="kn">from</span> <span class="nn">flask_admin.base</span> <span class="k">import</span> <span class="n">MenuLink</span>
<span class="c1"># Importing base classes that we need to derive</span>
<span class="kn">from</span> <span class="nn">airflow.hooks.base_hook</span> <span class="k">import</span> <span class="n">BaseHook</span>
<span class="kn">from</span> <span class="nn">airflow.models</span> <span class="k">import</span> <span class="n">BaseOperator</span>
<span class="kn">from</span> <span class="nn">airflow.sensors.base_sensor_operator</span> <span class="k">import</span> <span class="n">BaseSensorOperator</span>
<span class="kn">from</span> <span class="nn">airflow.executors.base_executor</span> <span class="k">import</span> <span class="n">BaseExecutor</span>
<span class="c1"># Will show up under airflow.hooks.test_plugin.PluginHook</span>
<span class="k">class</span> <span class="nc">PluginHook</span><span class="p">(</span><span class="n">BaseHook</span><span class="p">):</span>
<span class="k">pass</span>
<span class="c1"># Will show up under airflow.operators.test_plugin.PluginOperator</span>
<span class="k">class</span> <span class="nc">PluginOperator</span><span class="p">(</span><span class="n">BaseOperator</span><span class="p">):</span>
<span class="k">pass</span>
<span class="c1"># Will show up under airflow.sensors.test_plugin.PluginSensorOperator</span>
<span class="k">class</span> <span class="nc">PluginSensorOperator</span><span class="p">(</span><span class="n">BaseSensorOperator</span><span class="p">):</span>
<span class="k">pass</span>
<span class="c1"># Will show up under airflow.executors.test_plugin.PluginExecutor</span>
<span class="k">class</span> <span class="nc">PluginExecutor</span><span class="p">(</span><span class="n">BaseExecutor</span><span class="p">):</span>
<span class="k">pass</span>
<span class="c1"># Will show up under airflow.macros.test_plugin.plugin_macro</span>
<span class="k">def</span> <span class="nf">plugin_macro</span><span class="p">():</span>
<span class="k">pass</span>
<span class="c1"># Creating a flask admin BaseView</span>
<span class="k">class</span> <span class="nc">TestView</span><span class="p">(</span><span class="n">BaseView</span><span class="p">):</span>
<span class="nd">@expose</span><span class="p">(</span><span class="s1">&apos;/&apos;</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># in this example, put your test_plugin/test.html template at airflow/plugins/templates/test_plugin/test.html</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">render</span><span class="p">(</span><span class="s2">&quot;test_plugin/test.html&quot;</span><span class="p">,</span> <span class="n">content</span><span class="o">=</span><span class="s2">&quot;Hello galaxy!&quot;</span><span class="p">)</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">TestView</span><span class="p">(</span><span class="n">category</span><span class="o">=</span><span class="s2">&quot;Test Plugin&quot;</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s2">&quot;Test View&quot;</span><span class="p">)</span>
<span class="c1"># Creating a flask blueprint to integrate the templates and static folder</span>
<span class="n">bp</span> <span class="o">=</span> <span class="n">Blueprint</span><span class="p">(</span>
<span class="s2">&quot;test_plugin&quot;</span><span class="p">,</span> <span class="vm">__name__</span><span class="p">,</span>
<span class="n">template_folder</span><span class="o">=</span><span class="s1">&apos;templates&apos;</span><span class="p">,</span> <span class="c1"># registers airflow/plugins/templates as a Jinja template folder</span>
<span class="n">static_folder</span><span class="o">=</span><span class="s1">&apos;static&apos;</span><span class="p">,</span>
<span class="n">static_url_path</span><span class="o">=</span><span class="s1">&apos;/static/test_plugin&apos;</span><span class="p">)</span>
<span class="n">ml</span> <span class="o">=</span> <span class="n">MenuLink</span><span class="p">(</span>
<span class="n">category</span><span class="o">=</span><span class="s1">&apos;Test Plugin&apos;</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="s1">&apos;Test Menu Link&apos;</span><span class="p">,</span>
<span class="n">url</span><span class="o">=</span><span class="s1">&apos;https://airflow.incubator.apache.org/&apos;</span><span class="p">)</span>
<span class="c1"># Defining the plugin class</span>
<span class="k">class</span> <span class="nc">AirflowTestPlugin</span><span class="p">(</span><span class="n">AirflowPlugin</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="s2">&quot;test_plugin&quot;</span>
<span class="n">operators</span> <span class="o">=</span> <span class="p">[</span><span class="n">PluginOperator</span><span class="p">]</span>
<span class="n">sensors</span> <span class="o">=</span> <span class="p">[</span><span class="n">PluginSensorOperator</span><span class="p">]</span>
<span class="n">hooks</span> <span class="o">=</span> <span class="p">[</span><span class="n">PluginHook</span><span class="p">]</span>
<span class="n">executors</span> <span class="o">=</span> <span class="p">[</span><span class="n">PluginExecutor</span><span class="p">]</span>
<span class="n">macros</span> <span class="o">=</span> <span class="p">[</span><span class="n">plugin_macro</span><span class="p">]</span>
<span class="n">admin_views</span> <span class="o">=</span> <span class="p">[</span><span class="n">v</span><span class="p">]</span>
<span class="n">flask_blueprints</span> <span class="o">=</span> <span class="p">[</span><span class="n">bp</span><span class="p">]</span>
<span class="n">menu_links</span> <span class="o">=</span> <span class="p">[</span><span class="n">ml</span><span class="p">]</span>
</pre>
</div>
</div>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Security</h1>
<p>By default, all gates are opened. An easy way to restrict access
to the web application is to do it at the network level, or by using
SSH tunnels.</p>
<p>It is however possible to switch on authentication by either using one of the supplied
backends or creating your own.</p>
<p>Be sure to checkout <a class="reference internal" href="api.html"><span class="doc">Experimental Rest API</span></a> for securing the API.</p>
<div class="section" id="web-authentication">
<h2 class="sigil_not_in_toc">Web Authentication</h2>
<div class="section" id="password">
<h3 class="sigil_not_in_toc">Password</h3>
<p>One of the simplest mechanisms for authentication is requiring users to specify a password before logging in.
Password authentication requires the used of the <code class="docutils literal notranslate"><span class="pre">password</span></code> subpackage in your requirements file. Password hashing
uses bcrypt before storing passwords.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>webserver<span class="o">]</span>
<span class="nv">authenticate</span> <span class="o">=</span> True
<span class="nv">auth_backend</span> <span class="o">=</span> airflow.contrib.auth.backends.password_auth
</pre>
</div>
</div>
<p>When password auth is enabled, an initial user credential will need to be created before anyone can login. An initial
user was not created in the migrations for this authentication backend to prevent default Airflow installations from
attack. Creating a new user has to be done via a Python REPL on the same machine Airflow is installed.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># navigate to the airflow installation directory</span>
$ <span class="nb">cd</span> ~/airflow
$ python
Python <span class="m">2</span>.7.9 <span class="o">(</span>default, Feb <span class="m">10</span> <span class="m">2015</span>, <span class="m">03</span>:28:08<span class="o">)</span>
Type <span class="s2">&quot;help&quot;</span>, <span class="s2">&quot;copyright&quot;</span>, <span class="s2">&quot;credits&quot;</span> or <span class="s2">&quot;license&quot;</span> <span class="k">for</span> more information.
&gt;&gt;&gt; import airflow
&gt;&gt;&gt; from airflow import models, settings
&gt;&gt;&gt; from airflow.contrib.auth.backends.password_auth import PasswordUser
&gt;&gt;&gt; <span class="nv">user</span> <span class="o">=</span> PasswordUser<span class="o">(</span>models.User<span class="o">())</span>
&gt;&gt;&gt; user.username <span class="o">=</span> <span class="s1">&apos;new_user_name&apos;</span>
&gt;&gt;&gt; user.email <span class="o">=</span> <span class="s1">&apos;new_user_email@example.com&apos;</span>
&gt;&gt;&gt; user.password <span class="o">=</span> <span class="s1">&apos;set_the_password&apos;</span>
&gt;&gt;&gt; <span class="nv">session</span> <span class="o">=</span> settings.Session<span class="o">()</span>
&gt;&gt;&gt; session.add<span class="o">(</span>user<span class="o">)</span>
&gt;&gt;&gt; session.commit<span class="o">()</span>
&gt;&gt;&gt; session.close<span class="o">()</span>
&gt;&gt;&gt; exit<span class="o">()</span>
</pre>
</div>
</div>
</div>
<div class="section" id="ldap">
<h3 class="sigil_not_in_toc">LDAP</h3>
<p>To turn on LDAP authentication configure your <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> as follows. Please note that the example uses
an encrypted connection to the ldap server as you probably do not want passwords be readable on the network level.
It is however possible to configure without encryption if you really want to.</p>
<p>Additionally, if you are using Active Directory, and are not explicitly specifying an OU that your users are in,
you will need to change <code class="docutils literal notranslate"><span class="pre">search_scope</span></code> to &#x201C;SUBTREE&#x201D;.</p>
<p>Valid search_scope options can be found in the <a class="reference external" href="http://ldap3.readthedocs.org/searches.html?highlight=search_scope">ldap3 Documentation</a></p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>webserver<span class="o">]</span>
<span class="nv">authenticate</span> <span class="o">=</span> True
<span class="nv">auth_backend</span> <span class="o">=</span> airflow.contrib.auth.backends.ldap_auth
<span class="o">[</span>ldap<span class="o">]</span>
<span class="c1"># set a connection without encryption: uri = ldap://&lt;your.ldap.server&gt;:&lt;port&gt;</span>
<span class="nv">uri</span> <span class="o">=</span> ldaps://&lt;your.ldap.server&gt;:&lt;port&gt;
<span class="nv">user_filter</span> <span class="o">=</span> <span class="nv">objectClass</span><span class="o">=</span>*
<span class="c1"># in case of Active Directory you would use: user_name_attr = sAMAccountName</span>
<span class="nv">user_name_attr</span> <span class="o">=</span> uid
<span class="c1"># group_member_attr should be set accordingly with *_filter</span>
<span class="c1"># eg :</span>
<span class="c1"># group_member_attr = groupMembership</span>
<span class="c1"># superuser_filter = groupMembership=CN=airflow-super-users...</span>
<span class="nv">group_member_attr</span> <span class="o">=</span> memberOf
<span class="nv">superuser_filter</span> <span class="o">=</span> <span class="nv">memberOf</span><span class="o">=</span><span class="nv">CN</span><span class="o">=</span>airflow-super-users,OU<span class="o">=</span>Groups,OU<span class="o">=</span>RWC,OU<span class="o">=</span>US,OU<span class="o">=</span>NORAM,DC<span class="o">=</span>example,DC<span class="o">=</span>com
<span class="nv">data_profiler_filter</span> <span class="o">=</span> <span class="nv">memberOf</span><span class="o">=</span><span class="nv">CN</span><span class="o">=</span>airflow-data-profilers,OU<span class="o">=</span>Groups,OU<span class="o">=</span>RWC,OU<span class="o">=</span>US,OU<span class="o">=</span>NORAM,DC<span class="o">=</span>example,DC<span class="o">=</span>com
<span class="nv">bind_user</span> <span class="o">=</span> <span class="nv">cn</span><span class="o">=</span>Manager,dc<span class="o">=</span>example,dc<span class="o">=</span>com
<span class="nv">bind_password</span> <span class="o">=</span> insecure
<span class="nv">basedn</span> <span class="o">=</span> <span class="nv">dc</span><span class="o">=</span>example,dc<span class="o">=</span>com
<span class="nv">cacert</span> <span class="o">=</span> /etc/ca/ldap_ca.crt
<span class="c1"># Set search_scope to one of them: BASE, LEVEL , SUBTREE</span>
<span class="c1"># Set search_scope to SUBTREE if using Active Directory, and not specifying an Organizational Unit</span>
<span class="nv">search_scope</span> <span class="o">=</span> LEVEL
</pre>
</div>
</div>
<p>The superuser_filter and data_profiler_filter are optional. If defined, these configurations allow you to specify LDAP groups that users must belong to in order to have superuser (admin) and data-profiler permissions. If undefined, all users will be superusers and data profilers.</p>
</div>
<div class="section" id="roll-your-own">
<h3 class="sigil_not_in_toc">Roll your own</h3>
<p>Airflow uses <code class="docutils literal notranslate"><span class="pre">flask_login</span></code> and
exposes a set of hooks in the <code class="docutils literal notranslate"><span class="pre">airflow.default_login</span></code> module. You can
alter the content and make it part of the <code class="docutils literal notranslate"><span class="pre">PYTHONPATH</span></code> and configure it as a backend in <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>webserver<span class="o">]</span>
<span class="nv">authenticate</span> <span class="o">=</span> True
<span class="nv">auth_backend</span> <span class="o">=</span> mypackage.auth
</pre>
</div>
</div>
</div>
</div>
<div class="section" id="multi-tenancy">
<h2 class="sigil_not_in_toc">Multi-tenancy</h2>
<p>You can filter the list of dags in webserver by owner name when authentication
is turned on by setting <code class="docutils literal notranslate"><span class="pre">webserver:filter_by_owner</span></code> in your config. With this, a user will see
only the dags which it is owner of, unless it is a superuser.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>webserver<span class="o">]</span>
<span class="nv">filter_by_owner</span> <span class="o">=</span> True
</pre>
</div>
</div>
</div>
<div class="section" id="kerberos">
<h2 class="sigil_not_in_toc">Kerberos</h2>
<p>Airflow has initial support for Kerberos. This means that airflow can renew kerberos
tickets for itself and store it in the ticket cache. The hooks and dags can make use of ticket
to authenticate against kerberized services.</p>
<div class="section" id="limitations">
<h3 class="sigil_not_in_toc">Limitations</h3>
<p>Please note that at this time, not all hooks have been adjusted to make use of this functionality.
Also it does not integrate kerberos into the web interface and you will have to rely on network
level security for now to make sure your service remains secure.</p>
<p>Celery integration has not been tried and tested yet. However, if you generate a key tab for every
host and launch a ticket renewer next to every worker it will most likely work.</p>
</div>
<div class="section" id="enabling-kerberos">
<h3 class="sigil_not_in_toc">Enabling kerberos</h3>
<div class="section" id="airflow">
<h4 class="sigil_not_in_toc">Airflow</h4>
<p>To enable kerberos you will need to generate a (service) key tab.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># in the kadmin.local or kadmin shell, create the airflow principal</span>
kadmin: addprinc -randkey airflow/fully.qualified.domain.name@YOUR-REALM.COM
<span class="c1"># Create the airflow keytab file that will contain the airflow principal</span>
kadmin: xst -norandkey -k airflow.keytab airflow/fully.qualified.domain.name
</pre>
</div>
</div>
<p>Now store this file in a location where the airflow user can read it (chmod 600). And then add the following to
your <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code></p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>core<span class="o">]</span>
<span class="nv">security</span> <span class="o">=</span> kerberos
<span class="o">[</span>kerberos<span class="o">]</span>
<span class="nv">keytab</span> <span class="o">=</span> /etc/airflow/airflow.keytab
<span class="nv">reinit_frequency</span> <span class="o">=</span> <span class="m">3600</span>
<span class="nv">principal</span> <span class="o">=</span> airflow
</pre>
</div>
</div>
<p>Launch the ticket renewer by</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># run ticket renewer</span>
airflow kerberos
</pre>
</div>
</div>
</div>
<div class="section" id="hadoop">
<h4 class="sigil_not_in_toc">Hadoop</h4>
<p>If want to use impersonation this needs to be enabled in <code class="docutils literal notranslate"><span class="pre">core-site.xml</span></code> of your hadoop config.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>&lt;property&gt;
&lt;name&gt;hadoop.proxyuser.airflow.groups&lt;/name&gt;
&lt;value&gt;*&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;hadoop.proxyuser.airflow.users&lt;/name&gt;
&lt;value&gt;*&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;hadoop.proxyuser.airflow.hosts&lt;/name&gt;
&lt;value&gt;*&lt;/value&gt;
&lt;/property&gt;
</pre>
</div>
</div>
<p>Of course if you need to tighten your security replace the asterisk with something more appropriate.</p>
</div>
</div>
<div class="section" id="using-kerberos-authentication">
<h3 class="sigil_not_in_toc">Using kerberos authentication</h3>
<p>The hive hook has been updated to take advantage of kerberos authentication. To allow your DAGs to
use it, simply update the connection details with, for example:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">{</span> <span class="s2">&quot;use_beeline&quot;</span>: true, <span class="s2">&quot;principal&quot;</span>: <span class="s2">&quot;hive/_HOST@EXAMPLE.COM&quot;</span><span class="o">}</span>
</pre>
</div>
</div>
<p>Adjust the principal to your settings. The _HOST part will be replaced by the fully qualified domain name of
the server.</p>
<p>You can specify if you would like to use the dag owner as the user for the connection or the user specified in the login
section of the connection. For the login user, specify the following as extra:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">{</span> <span class="s2">&quot;use_beeline&quot;</span>: true, <span class="s2">&quot;principal&quot;</span>: <span class="s2">&quot;hive/_HOST@EXAMPLE.COM&quot;</span>, <span class="s2">&quot;proxy_user&quot;</span>: <span class="s2">&quot;login&quot;</span><span class="o">}</span>
</pre>
</div>
</div>
<p>For the DAG owner use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">{</span> <span class="s2">&quot;use_beeline&quot;</span>: true, <span class="s2">&quot;principal&quot;</span>: <span class="s2">&quot;hive/_HOST@EXAMPLE.COM&quot;</span>, <span class="s2">&quot;proxy_user&quot;</span>: <span class="s2">&quot;owner&quot;</span><span class="o">}</span>
</pre>
</div>
</div>
<p>and in your DAG, when initializing the HiveOperator, specify:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">run_as_owner</span><span class="o">=</span>True
</pre>
</div>
</div>
</div>
</div>
<div class="section" id="oauth-authentication">
<h2 class="sigil_not_in_toc">OAuth Authentication</h2>
<div class="section" id="github-enterprise-ghe-authentication">
<h3 class="sigil_not_in_toc">GitHub Enterprise (GHE) Authentication</h3>
<p>The GitHub Enterprise authentication backend can be used to authenticate users
against an installation of GitHub Enterprise using OAuth2. You can optionally
specify a team whitelist (composed of slug cased team names) to restrict login
to only members of those teams.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>webserver<span class="o">]</span>
<span class="nv">authenticate</span> <span class="o">=</span> True
<span class="nv">auth_backend</span> <span class="o">=</span> airflow.contrib.auth.backends.github_enterprise_auth
<span class="o">[</span>github_enterprise<span class="o">]</span>
<span class="nv">host</span> <span class="o">=</span> github.example.com
<span class="nv">client_id</span> <span class="o">=</span> oauth_key_from_github_enterprise
<span class="nv">client_secret</span> <span class="o">=</span> oauth_secret_from_github_enterprise
<span class="nv">oauth_callback_route</span> <span class="o">=</span> /example/ghe_oauth/callback
<span class="nv">allowed_teams</span> <span class="o">=</span> <span class="m">1</span>, <span class="m">345</span>, <span class="m">23</span>
</pre>
</div>
</div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If you do not specify a team whitelist, anyone with a valid account on
your GHE installation will be able to login to Airflow.</p>
</div>
<div class="section" id="setting-up-ghe-authentication">
<h4 class="sigil_not_in_toc">Setting up GHE Authentication</h4>
<p>An application must be setup in GHE before you can use the GHE authentication
backend. In order to setup an application:</p>
<ol class="arabic simple">
<li>Navigate to your GHE profile</li>
<li>Select &#x2018;Applications&#x2019; from the left hand nav</li>
<li>Select the &#x2018;Developer Applications&#x2019; tab</li>
<li>Click &#x2018;Register new application&#x2019;</li>
<li>Fill in the required information (the &#x2018;Authorization callback URL&#x2019; must be fully qualified e.g. <a class="reference external" href="http://airflow.example.com/example/ghe_oauth/callback">http://airflow.example.com/example/ghe_oauth/callback</a>)</li>
<li>Click &#x2018;Register application&#x2019;</li>
<li>Copy &#x2018;Client ID&#x2019;, &#x2018;Client Secret&#x2019;, and your callback route to your airflow.cfg according to the above example</li>
</ol>
</div>
<div class="section" id="using-ghe-authentication-with-github-com">
<h4 class="sigil_not_in_toc">Using GHE Authentication with github.com</h4>
<p>It is possible to use GHE authentication with github.com:</p>
<ol class="arabic simple">
<li><a class="reference external" href="https://developer.github.com/apps/building-oauth-apps/creating-an-oauth-app/">Create an Oauth App</a></li>
<li>Copy &#x2018;Client ID&#x2019;, &#x2018;Client Secret&#x2019; to your airflow.cfg according to the above example</li>
<li>Set <code class="docutils literal notranslate"><span class="pre">host</span> <span class="pre">=</span> <span class="pre">github.com</span></code> and <code class="docutils literal notranslate"><span class="pre">oauth_callback_route</span> <span class="pre">=</span> <span class="pre">/oauth/callback</span></code> in airflow.cfg</li>
</ol>
</div>
</div>
<div class="section" id="google-authentication">
<h3 class="sigil_not_in_toc">Google Authentication</h3>
<p>The Google authentication backend can be used to authenticate users
against Google using OAuth2. You must specify the email domains to restrict
login, separated with a comma, to only members of those domains.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>webserver<span class="o">]</span>
<span class="nv">authenticate</span> <span class="o">=</span> True
<span class="nv">auth_backend</span> <span class="o">=</span> airflow.contrib.auth.backends.google_auth
<span class="o">[</span>google<span class="o">]</span>
<span class="nv">client_id</span> <span class="o">=</span> google_client_id
<span class="nv">client_secret</span> <span class="o">=</span> google_client_secret
<span class="nv">oauth_callback_route</span> <span class="o">=</span> /oauth2callback
<span class="nv">domain</span> <span class="o">=</span> <span class="s2">&quot;example1.com,example2.com&quot;</span>
</pre>
</div>
</div>
<div class="section" id="setting-up-google-authentication">
<h4 class="sigil_not_in_toc">Setting up Google Authentication</h4>
<p>An application must be setup in the Google API Console before you can use the Google authentication
backend. In order to setup an application:</p>
<ol class="arabic simple">
<li>Navigate to <a class="reference external" href="https://console.developers.google.com/apis/">https://console.developers.google.com/apis/</a></li>
<li>Select &#x2018;Credentials&#x2019; from the left hand nav</li>
<li>Click &#x2018;Create credentials&#x2019; and choose &#x2018;OAuth client ID&#x2019;</li>
<li>Choose &#x2018;Web application&#x2019;</li>
<li>Fill in the required information (the &#x2018;Authorized redirect URIs&#x2019; must be fully qualified e.g. <a class="reference external" href="http://airflow.example.com/oauth2callback">http://airflow.example.com/oauth2callback</a>)</li>
<li>Click &#x2018;Create&#x2019;</li>
<li>Copy &#x2018;Client ID&#x2019;, &#x2018;Client Secret&#x2019;, and your redirect URI to your airflow.cfg according to the above example</li>
</ol>
</div>
</div>
</div>
<div class="section" id="ssl">
<h2 class="sigil_not_in_toc">SSL</h2>
<p>SSL can be enabled by providing a certificate and key. Once enabled, be sure to use
&#x201C;<a class="reference external" href="https://">https://</a>&#x201D; in your browser.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>webserver<span class="o">]</span>
<span class="nv">web_server_ssl_cert</span> <span class="o">=</span> &lt;path to cert&gt;
<span class="nv">web_server_ssl_key</span> <span class="o">=</span> &lt;path to key&gt;
</pre>
</div>
</div>
<p>Enabling SSL will not automatically change the web server port. If you want to use the
standard port 443, you&#x2019;ll need to configure that too. Be aware that super user privileges
(or cap_net_bind_service on Linux) are required to listen on port 443.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># Optionally, set the server to listen on the standard SSL port.</span>
<span class="nv">web_server_port</span> <span class="o">=</span> <span class="m">443</span>
<span class="nv">base_url</span> <span class="o">=</span> http://&lt;hostname or IP&gt;:443
</pre>
</div>
</div>
<p>Enable CeleryExecutor with SSL. Ensure you properly generate client and server
certs and keys.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>celery<span class="o">]</span>
<span class="nv">CELERY_SSL_ACTIVE</span> <span class="o">=</span> True
<span class="nv">CELERY_SSL_KEY</span> <span class="o">=</span> &lt;path to key&gt;
<span class="nv">CELERY_SSL_CERT</span> <span class="o">=</span> &lt;path to cert&gt;
<span class="nv">CELERY_SSL_CACERT</span> <span class="o">=</span> &lt;path to cacert&gt;
</pre>
</div>
</div>
</div>
<div class="section" id="impersonation">
<h2 class="sigil_not_in_toc">Impersonation</h2>
<p>Airflow has the ability to impersonate a unix user while running task
instances based on the task&#x2019;s <code class="docutils literal notranslate"><span class="pre">run_as_user</span></code> parameter, which takes a user&#x2019;s name.</p>
<p><strong>NOTE:</strong> For impersonations to work, Airflow must be run with <cite>sudo</cite> as subtasks are run
with <cite>sudo -u</cite> and permissions of files are changed. Furthermore, the unix user needs to
exist on the worker. Here is what a simple sudoers file entry could look like to achieve
this, assuming as airflow is running as the <cite>airflow</cite> user. Note that this means that
the airflow user must be trusted and treated the same way as the root user.</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>airflow ALL=(ALL) NOPASSWD: ALL
</pre>
</div>
</div>
<p>Subtasks with impersonation will still log to the same folder, except that the files they
log to will have permissions changed such that only the unix user can write to it.</p>
<div class="section" id="default-impersonation">
<h3 class="sigil_not_in_toc">Default Impersonation</h3>
<p>To prevent tasks that don&#x2019;t use impersonation to be run with <cite>sudo</cite> privileges, you can set the
<code class="docutils literal notranslate"><span class="pre">core:default_impersonation</span></code> config which sets a default user impersonate if <cite>run_as_user</cite> is
not set.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>core<span class="o">]</span>
<span class="nv">default_impersonation</span> <span class="o">=</span> airflow
</pre>
</div>
</div>
</div>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Time zones</h1>
<p>Support for time zones is enabled by default. Airflow stores datetime information in UTC internally and in the database.
It allows you to run your DAGs with time zone dependent schedules. At the moment Airflow does not convert them to the
end user&#x2019;s time zone in the user interface. There it will always be displayed in UTC. Also templates used in Operators
are not converted. Time zone information is exposed and it is up to the writer of DAG what do with it.</p>
<p>This is handy if your users live in more than one time zone and you want to display datetime information according to
each user&#x2019;s wall clock.</p>
<p>Even if you are running Airflow in only one time zone it is still good practice to store data in UTC in your database
(also before Airflow became time zone aware this was also to recommended or even required setup). The main reason is
Daylight Saving Time (DST). Many countries have a system of DST, where clocks are moved forward in spring and backward
in autumn. If you&#x2019;re working in local time, you&#x2019;re likely to encounter errors twice a year, when the transitions
happen. (The pendulum and pytz documentation discusses these issues in greater detail.) This probably doesn&#x2019;t matter
for a simple DAG, but it&#x2019;s a problem if you are in, for example, financial services where you have end of day
deadlines to meet.</p>
<p>The time zone is set in <cite>airflow.cfg</cite>. By default it is set to utc, but you change it to use the system&#x2019;s settings or
an arbitrary IANA time zone, e.g. <cite>Europe/Amsterdam</cite>. It is dependent on <cite>pendulum</cite>, which is more accurate than <cite>pytz</cite>.
Pendulum is installed when you install Airflow.</p>
<p>Please note that the Web UI currently only runs in UTC.</p>
<div class="section" id="concepts">
<h2 class="sigil_not_in_toc">Concepts</h2>
<div class="section" id="naive-and-aware-datetime-objects">
<h3 class="sigil_not_in_toc">Na&#xEF;ve and aware datetime objects</h3>
<p>Python&#x2019;s datetime.datetime objects have a tzinfo attribute that can be used to store time zone information,
represented as an instance of a subclass of datetime.tzinfo. When this attribute is set and describes an offset,
a datetime object is aware. Otherwise, it&#x2019;s naive.</p>
<p>You can use timezone.is_aware() and timezone.is_naive() to determine whether datetimes are aware or naive.</p>
<p>Because Airflow uses time-zone-aware datetime objects. If your code creates datetime objects they need to be aware too.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">airflow.utils</span> <span class="k">import</span> <span class="n">timezone</span>
<span class="n">now</span> <span class="o">=</span> <span class="n">timezone</span><span class="o">.</span><span class="n">utcnow</span><span class="p">()</span>
<span class="n">a_date</span> <span class="o">=</span> <span class="n">timezone</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2017</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
</pre>
</div>
</div>
</div>
<div class="section" id="interpretation-of-naive-datetime-objects">
<h3 class="sigil_not_in_toc">Interpretation of naive datetime objects</h3>
<p>Although Airflow operates fully time zone aware, it still accepts naive date time objects for <cite>start_dates</cite>
and <cite>end_dates</cite> in your DAG definitions. This is mostly in order to preserve backwards compatibility. In
case a naive <cite>start_date</cite> or <cite>end_date</cite> is encountered the default time zone is applied. It is applied
in such a way that it is assumed that the naive date time is already in the default time zone. In other
words if you have a default time zone setting of <cite>Europe/Amsterdam</cite> and create a naive datetime <cite>start_date</cite> of
<cite>datetime(2017,1,1)</cite> it is assumed to be a <cite>start_date</cite> of Jan 1, 2017 Amsterdam time.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">default_args</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span>
<span class="n">start_date</span><span class="o">=</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2016</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="n">owner</span><span class="o">=</span><span class="s1">&apos;Airflow&apos;</span>
<span class="p">)</span>
<span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="s1">&apos;my_dag&apos;</span><span class="p">,</span> <span class="n">default_args</span><span class="o">=</span><span class="n">default_args</span><span class="p">)</span>
<span class="n">op</span> <span class="o">=</span> <span class="n">DummyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;dummy&apos;</span><span class="p">,</span> <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">op</span><span class="o">.</span><span class="n">owner</span><span class="p">)</span> <span class="c1"># Airflow</span>
</pre>
</div>
</div>
<p>Unfortunately, during DST transitions, some datetimes don&#x2019;t exist or are ambiguous.
In such situations, pendulum raises an exception. That&#x2019;s why you should always create aware
datetime objects when time zone support is enabled.</p>
<p>In practice, this is rarely an issue. Airflow gives you aware datetime objects in the models and DAGs, and most often,
new datetime objects are created from existing ones through timedelta arithmetic. The only datetime that&#x2019;s often
created in application code is the current time, and timezone.utcnow() automatically does the right thing.</p>
</div>
<div class="section" id="default-time-zone">
<h3 class="sigil_not_in_toc">Default time zone</h3>
<p>The default time zone is the time zone defined by the <cite>default_timezone</cite> setting under <cite>[core]</cite>. If
you just installed Airflow it will be set to <cite>utc</cite>, which is recommended. You can also set it to
<cite>system</cite> or an IANA time zone (e.g.`Europe/Amsterdam`). DAGs are also evaluated on Airflow workers,
it is therefore important to make sure this setting is equal on all Airflow nodes.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="n">core</span><span class="p">]</span>
<span class="n">default_timezone</span> <span class="o">=</span> <span class="n">utc</span>
</pre>
</div>
</div>
</div>
</div>
<div class="section" id="time-zone-aware-dags">
<h2 class="sigil_not_in_toc">Time zone aware DAGs</h2>
<p>Creating a time zone aware DAG is quite simple. Just make sure to supply a time zone aware <cite>start_date</cite>. It is
recommended to use <cite>pendulum</cite> for this, but <cite>pytz</cite> (to be installed manually) can also be used for this.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pendulum</span>
<span class="n">local_tz</span> <span class="o">=</span> <span class="n">pendulum</span><span class="o">.</span><span class="n">timezone</span><span class="p">(</span><span class="s2">&quot;Europe/Amsterdam&quot;</span><span class="p">)</span>
<span class="n">default_args</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span>
<span class="n">start_date</span><span class="o">=</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2016</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">tzinfo</span><span class="o">=</span><span class="n">local_tz</span><span class="p">),</span>
<span class="n">owner</span><span class="o">=</span><span class="s1">&apos;Airflow&apos;</span>
<span class="p">)</span>
<span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="s1">&apos;my_tz_dag&apos;</span><span class="p">,</span> <span class="n">default_args</span><span class="o">=</span><span class="n">default_args</span><span class="p">)</span>
<span class="n">op</span> <span class="o">=</span> <span class="n">DummyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;dummy&apos;</span><span class="p">,</span> <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">dag</span><span class="o">.</span><span class="n">timezone</span><span class="p">)</span> <span class="c1"># &lt;Timezone [Europe/Amsterdam]&gt;</span>
</pre>
</div>
</div>
<div class="section" id="templates">
<h3 class="sigil_not_in_toc">Templates</h3>
<p>Airflow returns time zone aware datetimes in templates, but does not convert them to local time so they remain in UTC.
It is left up to the DAG to handle this.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pendulum</span>
<span class="n">local_tz</span> <span class="o">=</span> <span class="n">pendulum</span><span class="o">.</span><span class="n">timezone</span><span class="p">(</span><span class="s2">&quot;Europe/Amsterdam&quot;</span><span class="p">)</span>
<span class="n">local_tz</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="n">execution_date</span><span class="p">)</span>
</pre>
</div>
</div>
</div>
<div class="section" id="cron-schedules">
<h3 class="sigil_not_in_toc">Cron schedules</h3>
<p>In case you set a cron schedule, Airflow assumes you will always want to run at the exact same time. It will
then ignore day light savings time. Thus, if you have a schedule that says
run at end of interval every day at 08:00 GMT+1 it will always run end of interval 08:00 GMT+1,
regardless if day light savings time is in place.</p>
</div>
<div class="section" id="time-deltas">
<h3 class="sigil_not_in_toc">Time deltas</h3>
<p>For schedules with time deltas Airflow assumes you always will want to run with the specified interval. So if you
specify a timedelta(hours=2) you will always want to run to hours later. In this case day light savings time will
be taken into account.</p>
</div>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Experimental Rest API</h1>
<p>Airflow exposes an experimental Rest API. It is available through the webserver. Endpoints are
available at /api/experimental/. Please note that we expect the endpoint definitions to change.</p>
<div class="section" id="endpoints">
<h2 class="sigil_not_in_toc">Endpoints</h2>
<p>This is a place holder until the swagger definitions are active</p>
<ul class="simple">
<li>/api/experimental/dags/&lt;DAG_ID&gt;/tasks/&lt;TASK_ID&gt; returns info for a task (GET).</li>
<li>/api/experimental/dags/&lt;DAG_ID&gt;/dag_runs creates a dag_run for a given dag id (POST).</li>
</ul>
</div>
<div class="section" id="cli">
<h2 class="sigil_not_in_toc">CLI</h2>
<p>For some functions the cli can use the API. To configure the CLI to use the API when available
configure as follows:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>cli<span class="o">]</span>
<span class="nv">api_client</span> <span class="o">=</span> airflow.api.client.json_client
<span class="nv">endpoint_url</span> <span class="o">=</span> http://&lt;WEBSERVER&gt;:&lt;PORT&gt;
</pre>
</div>
</div>
</div>
<div class="section" id="authentication">
<h2 class="sigil_not_in_toc">Authentication</h2>
<p>Authentication for the API is handled separately to the Web Authentication. The default is to not
require any authentication on the API &#x2013; i.e. wide open by default. This is not recommended if your
Airflow webserver is publicly accessible, and you should probably use the deny all backend:</p>
<div class="highlight-ini notranslate"><div class="highlight"><pre><span></span><span class="k">[api]</span>
<span class="na">auth_backend</span> <span class="o">=</span> <span class="s">airflow.api.auth.backend.deny_all</span>
</pre>
</div>
</div>
<p>Two &#x201C;real&#x201D; methods for authentication are currently supported for the API.</p>
<p>To enabled Password authentication, set the following in the configuration:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>api<span class="o">]</span>
<span class="nv">auth_backend</span> <span class="o">=</span> airflow.contrib.auth.backends.password_auth
</pre>
</div>
</div>
<p>It&#x2019;s usage is similar to the Password Authentication used for the Web interface.</p>
<p>To enable Kerberos authentication, set the following in the configuration:</p>
<div class="highlight-ini notranslate"><div class="highlight"><pre><span></span><span class="k">[api]</span>
<span class="na">auth_backend</span> <span class="o">=</span> <span class="s">airflow.api.auth.backend.kerberos_auth</span>
<span class="k">[kerberos]</span>
<span class="na">keytab</span> <span class="o">=</span> <span class="s">&lt;KEYTAB&gt;</span>
</pre>
</div>
</div>
<p>The Kerberos service is configured as <code class="docutils literal notranslate"><span class="pre">airflow/fully.qualified.domainname@REALM</span></code>. Make sure this
principal exists in the keytab file.</p>
</div>
</body>
</html>
\ No newline at end of file
因为 它太大了无法显示 source diff 。你可以改为 查看blob
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Lineage</h1>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Lineage support is very experimental and subject to change.</p>
</div>
<p>Airflow can help track origins of data, what happens to it and where it moves over time. This can aid having
audit trails and data governance, but also debugging of data flows.</p>
<p>Airflow tracks data by means of inlets and outlets of the tasks. Let&#x2019;s work from an example and see how it
works.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">airflow.operators.bash_operator</span> <span class="k">import</span> <span class="n">BashOperator</span>
<span class="kn">from</span> <span class="nn">airflow.operators.dummy_operator</span> <span class="k">import</span> <span class="n">DummyOperator</span>
<span class="kn">from</span> <span class="nn">airflow.lineage.datasets</span> <span class="k">import</span> <span class="n">File</span>
<span class="kn">from</span> <span class="nn">airflow.models</span> <span class="k">import</span> <span class="n">DAG</span>
<span class="kn">from</span> <span class="nn">datetime</span> <span class="k">import</span> <span class="n">timedelta</span>
<span class="n">FILE_CATEGORIES</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&quot;CAT1&quot;</span><span class="p">,</span> <span class="s2">&quot;CAT2&quot;</span><span class="p">,</span> <span class="s2">&quot;CAT3&quot;</span><span class="p">]</span>
<span class="n">args</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">&apos;owner&apos;</span><span class="p">:</span> <span class="s1">&apos;airflow&apos;</span><span class="p">,</span>
<span class="s1">&apos;start_date&apos;</span><span class="p">:</span> <span class="n">airflow</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">dates</span><span class="o">.</span><span class="n">days_ago</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span>
<span class="n">dag_id</span><span class="o">=</span><span class="s1">&apos;example_lineage&apos;</span><span class="p">,</span> <span class="n">default_args</span><span class="o">=</span><span class="n">args</span><span class="p">,</span>
<span class="n">schedule_interval</span><span class="o">=</span><span class="s1">&apos;0 0 * * *&apos;</span><span class="p">,</span>
<span class="n">dagrun_timeout</span><span class="o">=</span><span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">60</span><span class="p">))</span>
<span class="n">f_final</span> <span class="o">=</span> <span class="n">File</span><span class="p">(</span><span class="s2">&quot;/tmp/final&quot;</span><span class="p">)</span>
<span class="n">run_this_last</span> <span class="o">=</span> <span class="n">DummyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;run_this_last&apos;</span><span class="p">,</span> <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">,</span>
<span class="n">inlets</span><span class="o">=</span><span class="p">{</span><span class="s2">&quot;auto&quot;</span><span class="p">:</span> <span class="kc">True</span><span class="p">},</span>
<span class="n">outlets</span><span class="o">=</span><span class="p">{</span><span class="s2">&quot;datasets&quot;</span><span class="p">:</span> <span class="p">[</span><span class="n">f_final</span><span class="p">,]})</span>
<span class="n">f_in</span> <span class="o">=</span> <span class="n">File</span><span class="p">(</span><span class="s2">&quot;/tmp/whole_directory/&quot;</span><span class="p">)</span>
<span class="n">outlets</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">file</span> <span class="ow">in</span> <span class="n">FILE_CATEGORIES</span><span class="p">:</span>
<span class="n">f_out</span> <span class="o">=</span> <span class="n">File</span><span class="p">(</span><span class="s2">&quot;/tmp/</span><span class="si">{}</span><span class="s2">/{{{{ execution_date }}}}&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">file</span><span class="p">))</span>
<span class="n">outlets</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">f_out</span><span class="p">)</span>
<span class="n">run_this</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;run_me_first&apos;</span><span class="p">,</span> <span class="n">bash_command</span><span class="o">=</span><span class="s1">&apos;echo 1&apos;</span><span class="p">,</span> <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">,</span>
<span class="n">inlets</span><span class="o">=</span><span class="p">{</span><span class="s2">&quot;datasets&quot;</span><span class="p">:</span> <span class="p">[</span><span class="n">f_in</span><span class="p">,]},</span>
<span class="n">outlets</span><span class="o">=</span><span class="p">{</span><span class="s2">&quot;datasets&quot;</span><span class="p">:</span> <span class="n">outlets</span><span class="p">}</span>
<span class="p">)</span>
<span class="n">run_this</span><span class="o">.</span><span class="n">set_downstream</span><span class="p">(</span><span class="n">run_this_last</span><span class="p">)</span>
</pre>
</div>
</div>
<p>Tasks take the parameters <cite>inlets</cite> and <cite>outlets</cite>. Inlets can be manually defined by a list of dataset <cite>{&#x201C;datasets&#x201D;:
[dataset1, dataset2]}</cite> or can be configured to look for outlets from upstream tasks <cite>{&#x201C;task_ids&#x201D;: [&#x201C;task_id1&#x201D;, &#x201C;task_id2&#x201D;]}</cite>
or can be configured to pick up outlets from direct upstream tasks <cite>{&#x201C;auto&#x201D;: True}</cite> or a combination of them. Outlets
are defined as list of dataset <cite>{&#x201C;datasets&#x201D;: [dataset1, dataset2]}</cite>. Any fields for the dataset are templated with
the context when the task is being executed.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Operators can add inlets and outlets automatically if the operator supports it.</p>
</div>
<p>In the example DAG task <cite>run_me_first</cite> is a BashOperator that takes 3 inlets: <cite>CAT1</cite>, <cite>CAT2</cite>, <cite>CAT3</cite>, that are
generated from a list. Note that <cite>execution_date</cite> is a templated field and will be rendered when the task is running.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Behind the scenes Airflow prepares the lineage metadata as part of the <cite>pre_execute</cite> method of a task. When the task
has finished execution <cite>post_execute</cite> is called and lineage metadata is pushed into XCOM. Thus if you are creating
your own operators that override this method make sure to decorate your method with <cite>prepare_lineage</cite> and <cite>apply_lineage</cite>
respectively.</p>
</div>
<div class="section" id="apache-atlas">
<h2 class="sigil_not_in_toc">Apache Atlas</h2>
<p>Airflow can send its lineage metadata to Apache Atlas. You need to enable the <cite>atlas</cite> backend and configure it
properly, e.g. in your <cite>airflow.cfg</cite>:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="n">lineage</span><span class="p">]</span>
<span class="n">backend</span> <span class="o">=</span> <span class="n">airflow</span><span class="o">.</span><span class="n">lineage</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">atlas</span>
<span class="p">[</span><span class="n">atlas</span><span class="p">]</span>
<span class="n">username</span> <span class="o">=</span> <span class="n">my_username</span>
<span class="n">password</span> <span class="o">=</span> <span class="n">my_password</span>
<span class="n">host</span> <span class="o">=</span> <span class="n">host</span>
<span class="n">port</span> <span class="o">=</span> <span class="mi">21000</span>
</pre>
</div>
</div>
<p>Please make sure to have the <cite>atlasclient</cite> package installed.</p>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Quick Start</h1>
<p>The installation is quick and straightforward.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># airflow needs a home, ~/airflow is the default,</span>
<span class="c1"># but you can lay foundation somewhere else if you prefer</span>
<span class="c1"># (optional)</span>
<span class="nb">export</span> <span class="nv">AIRFLOW_HOME</span><span class="o">=</span>~/airflow
<span class="c1"># install from pypi using pip</span>
pip install apache-airflow
<span class="c1"># initialize the database</span>
airflow initdb
<span class="c1"># start the web server, default port is 8080</span>
airflow webserver -p <span class="m">8080</span>
<span class="c1"># start the scheduler</span>
airflow scheduler
<span class="c1"># visit localhost:8080 in the browser and enable the example dag in the home page</span>
</pre>
</div>
</div>
<p>Upon running these commands, Airflow will create the <code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME</span></code> folder
and lay an &#x201C;airflow.cfg&#x201D; file with defaults that get you going fast. You can
inspect the file either in <code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME/airflow.cfg</span></code>, or through the UI in
the <code class="docutils literal notranslate"><span class="pre">Admin-&gt;Configuration</span></code> menu. The PID file for the webserver will be stored
in <code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME/airflow-webserver.pid</span></code> or in <code class="docutils literal notranslate"><span class="pre">/run/airflow/webserver.pid</span></code>
if started by systemd.</p>
<p>Out of the box, Airflow uses a sqlite database, which you should outgrow
fairly quickly since no parallelization is possible using this database
backend. It works in conjunction with the <code class="docutils literal notranslate"><span class="pre">SequentialExecutor</span></code> which will
only run task instances sequentially. While this is very limiting, it allows
you to get up and running quickly and take a tour of the UI and the
command line utilities.</p>
<p>Here are a few commands that will trigger a few task instances. You should
be able to see the status of the jobs change in the <code class="docutils literal notranslate"><span class="pre">example1</span></code> DAG as you
run the commands below.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># run your first task instance</span>
airflow run example_bash_operator runme_0 <span class="m">2015</span>-01-01
<span class="c1"># run a backfill over 2 days</span>
airflow backfill example_bash_operator -s <span class="m">2015</span>-01-01 -e <span class="m">2015</span>-01-02
</pre>
</div>
</div>
<div class="section" id="what-s-next">
<h2 class="sigil_not_in_toc">What&#x2019;s Next?</h2>
<p>From this point, you can head to the <a class="reference internal" href="tutorial.html"><span class="doc">Tutorial</span></a> section for further examples or the <a class="reference internal" href="howto/index.html"><span class="doc">How-to Guides</span></a> section if you&#x2019;re ready to get your hands dirty.</p>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>FAQ</h1>
<div class="section" id="why-isn-t-my-task-getting-scheduled">
<h2 class="sigil_not_in_toc">Why isn&#x2019;t my task getting scheduled?</h2>
<p>There are very many reasons why your task might not be getting scheduled.
Here are some of the common causes:</p>
<ul class="simple">
<li>Does your script &#x201C;compile&#x201D;, can the Airflow engine parse it and find your
DAG object. To test this, you can run <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">list_dags</span></code> and
confirm that your DAG shows up in the list. You can also run
<code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">list_tasks</span> <span class="pre">foo_dag_id</span> <span class="pre">--tree</span></code> and confirm that your task
shows up in the list as expected. If you use the CeleryExecutor, you
may want to confirm that this works both where the scheduler runs as well
as where the worker runs.</li>
<li>Does the file containing your DAG contain the string &#x201C;airflow&#x201D; and &#x201C;DAG&#x201D; somewhere
in the contents? When searching the DAG directory, Airflow ignores files not containing
&#x201C;airflow&#x201D; and &#x201C;DAG&#x201D; in order to prevent the DagBag parsing from importing all python
files collocated with user&#x2019;s DAGs.</li>
<li>Is your <code class="docutils literal notranslate"><span class="pre">start_date</span></code> set properly? The Airflow scheduler triggers the
task soon after the <code class="docutils literal notranslate"><span class="pre">start_date</span> <span class="pre">+</span> <span class="pre">scheduler_interval</span></code> is passed.</li>
<li>Is your <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code> set properly? The default <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code>
is one day (<code class="docutils literal notranslate"><span class="pre">datetime.timedelta(1)</span></code>). You must specify a different <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code>
directly to the DAG object you instantiate, not as a <code class="docutils literal notranslate"><span class="pre">default_param</span></code>, as task instances
do not override their parent DAG&#x2019;s <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code>.</li>
<li>Is your <code class="docutils literal notranslate"><span class="pre">start_date</span></code> beyond where you can see it in the UI? If you
set your <code class="docutils literal notranslate"><span class="pre">start_date</span></code> to some time say 3 months ago, you won&#x2019;t be able to see
it in the main view in the UI, but you should be able to see it in the
<code class="docutils literal notranslate"><span class="pre">Menu</span> <span class="pre">-&gt;</span> <span class="pre">Browse</span> <span class="pre">-&gt;Task</span> <span class="pre">Instances</span></code>.</li>
<li>Are the dependencies for the task met. The task instances directly
upstream from the task need to be in a <code class="docutils literal notranslate"><span class="pre">success</span></code> state. Also,
if you have set <code class="docutils literal notranslate"><span class="pre">depends_on_past=True</span></code>, the previous task instance
needs to have succeeded (except if it is the first run for that task).
Also, if <code class="docutils literal notranslate"><span class="pre">wait_for_downstream=True</span></code>, make sure you understand
what it means.
You can view how these properties are set from the <code class="docutils literal notranslate"><span class="pre">Task</span> <span class="pre">Instance</span> <span class="pre">Details</span></code>
page for your task.</li>
<li>Are the DagRuns you need created and active? A DagRun represents a specific
execution of an entire DAG and has a state (running, success, failed, &#x2026;).
The scheduler creates new DagRun as it moves forward, but never goes back
in time to create new ones. The scheduler only evaluates <code class="docutils literal notranslate"><span class="pre">running</span></code> DagRuns
to see what task instances it can trigger. Note that clearing tasks
instances (from the UI or CLI) does set the state of a DagRun back to
running. You can bulk view the list of DagRuns and alter states by clicking
on the schedule tag for a DAG.</li>
<li>Is the <code class="docutils literal notranslate"><span class="pre">concurrency</span></code> parameter of your DAG reached? <code class="docutils literal notranslate"><span class="pre">concurrency</span></code> defines
how many <code class="docutils literal notranslate"><span class="pre">running</span></code> task instances a DAG is allowed to have, beyond which
point things get queued.</li>
<li>Is the <code class="docutils literal notranslate"><span class="pre">max_active_runs</span></code> parameter of your DAG reached? <code class="docutils literal notranslate"><span class="pre">max_active_runs</span></code> defines
how many <code class="docutils literal notranslate"><span class="pre">running</span></code> concurrent instances of a DAG there are allowed to be.</li>
</ul>
<p>You may also want to read the Scheduler section of the docs and make
sure you fully understand how it proceeds.</p>
</div>
<div class="section" id="how-do-i-trigger-tasks-based-on-another-task-s-failure">
<h2 class="sigil_not_in_toc">How do I trigger tasks based on another task&#x2019;s failure?</h2>
<p>Check out the <code class="docutils literal notranslate"><span class="pre">Trigger</span> <span class="pre">Rule</span></code> section in the Concepts section of the
documentation</p>
</div>
<div class="section" id="why-are-connection-passwords-still-not-encrypted-in-the-metadata-db-after-i-installed-airflow-crypto">
<h2 class="sigil_not_in_toc">Why are connection passwords still not encrypted in the metadata db after I installed airflow[crypto]?</h2>
<p>Check out the <code class="docutils literal notranslate"><span class="pre">Connections</span></code> section in the Configuration section of the
documentation</p>
</div>
<div class="section" id="what-s-the-deal-with-start-date">
<h2 class="sigil_not_in_toc">What&#x2019;s the deal with <code class="docutils literal notranslate"><span class="pre">start_date</span></code>?</h2>
<p><code class="docutils literal notranslate"><span class="pre">start_date</span></code> is partly legacy from the pre-DagRun era, but it is still
relevant in many ways. When creating a new DAG, you probably want to set
a global <code class="docutils literal notranslate"><span class="pre">start_date</span></code> for your tasks using <code class="docutils literal notranslate"><span class="pre">default_args</span></code>. The first
DagRun to be created will be based on the <code class="docutils literal notranslate"><span class="pre">min(start_date)</span></code> for all your
task. From that point on, the scheduler creates new DagRuns based on
your <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code> and the corresponding task instances run as your
dependencies are met. When introducing new tasks to your DAG, you need to
pay special attention to <code class="docutils literal notranslate"><span class="pre">start_date</span></code>, and may want to reactivate
inactive DagRuns to get the new task onboarded properly.</p>
<p>We recommend against using dynamic values as <code class="docutils literal notranslate"><span class="pre">start_date</span></code>, especially
<code class="docutils literal notranslate"><span class="pre">datetime.now()</span></code> as it can be quite confusing. The task is triggered
once the period closes, and in theory an <code class="docutils literal notranslate"><span class="pre">@hourly</span></code> DAG would never get to
an hour after now as <code class="docutils literal notranslate"><span class="pre">now()</span></code> moves along.</p>
<p>Previously we also recommended using rounded <code class="docutils literal notranslate"><span class="pre">start_date</span></code> in relation to your
<code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code>. This meant an <code class="docutils literal notranslate"><span class="pre">@hourly</span></code> would be at <code class="docutils literal notranslate"><span class="pre">00:00</span></code>
minutes:seconds, a <code class="docutils literal notranslate"><span class="pre">@daily</span></code> job at midnight, a <code class="docutils literal notranslate"><span class="pre">@monthly</span></code> job on the
first of the month. This is no longer required. Airflow will now auto align
the <code class="docutils literal notranslate"><span class="pre">start_date</span></code> and the <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code>, by using the <code class="docutils literal notranslate"><span class="pre">start_date</span></code>
as the moment to start looking.</p>
<p>You can use any sensor or a <code class="docutils literal notranslate"><span class="pre">TimeDeltaSensor</span></code> to delay
the execution of tasks within the schedule interval.
While <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code> does allow specifying a <code class="docutils literal notranslate"><span class="pre">datetime.timedelta</span></code>
object, we recommend using the macros or cron expressions instead, as
it enforces this idea of rounded schedules.</p>
<p>When using <code class="docutils literal notranslate"><span class="pre">depends_on_past=True</span></code> it&#x2019;s important to pay special attention
to <code class="docutils literal notranslate"><span class="pre">start_date</span></code> as the past dependency is not enforced only on the specific
schedule of the <code class="docutils literal notranslate"><span class="pre">start_date</span></code> specified for the task. It&#x2019;s also
important to watch DagRun activity status in time when introducing
new <code class="docutils literal notranslate"><span class="pre">depends_on_past=True</span></code>, unless you are planning on running a backfill
for the new task(s).</p>
<p>Also important to note is that the tasks <code class="docutils literal notranslate"><span class="pre">start_date</span></code>, in the context of a
backfill CLI command, get overridden by the backfill&#x2019;s command <code class="docutils literal notranslate"><span class="pre">start_date</span></code>.
This allows for a backfill on tasks that have <code class="docutils literal notranslate"><span class="pre">depends_on_past=True</span></code> to
actually start, if that wasn&#x2019;t the case, the backfill just wouldn&#x2019;t start.</p>
</div>
<div class="section" id="how-can-i-create-dags-dynamically">
<h2 class="sigil_not_in_toc">How can I create DAGs dynamically?</h2>
<p>Airflow looks in your <code class="docutils literal notranslate"><span class="pre">DAGS_FOLDER</span></code> for modules that contain <code class="docutils literal notranslate"><span class="pre">DAG</span></code> objects
in their global namespace, and adds the objects it finds in the
<code class="docutils literal notranslate"><span class="pre">DagBag</span></code>. Knowing this all we need is a way to dynamically assign
variable in the global namespace, which is easily done in python using the
<code class="docutils literal notranslate"><span class="pre">globals()</span></code> function for the standard library which behaves like a
simple dictionary.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
<span class="n">dag_id</span> <span class="o">=</span> <span class="s1">&apos;foo_</span><span class="si">{}</span><span class="s1">&apos;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="nb">globals</span><span class="p">()[</span><span class="n">dag_id</span><span class="p">]</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="n">dag_id</span><span class="p">)</span>
<span class="c1"># or better, call a function that returns a DAG object!</span>
</pre>
</div>
</div>
</div>
<div class="section" id="what-are-all-the-airflow-run-commands-in-my-process-list">
<h2 class="sigil_not_in_toc">What are all the <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span></code> commands in my process list?</h2>
<p>There are many layers of <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span></code> commands, meaning it can call itself.</p>
<ul class="simple">
<li>Basic <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span></code>: fires up an executor, and tell it to run an
<code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span> <span class="pre">--local</span></code> command. if using Celery, this means it puts a
command in the queue for it to run remote, on the worker. If using
LocalExecutor, that translates into running it in a subprocess pool.</li>
<li>Local <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span> <span class="pre">--local</span></code>: starts an <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span> <span class="pre">--raw</span></code>
command (described below) as a subprocess and is in charge of
emitting heartbeats, listening for external kill signals
and ensures some cleanup takes place if the subprocess fails</li>
<li>Raw <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span> <span class="pre">--raw</span></code> runs the actual operator&#x2019;s execute method and
performs the actual work</li>
</ul>
</div>
<div class="section" id="how-can-my-airflow-dag-run-faster">
<h2 class="sigil_not_in_toc">How can my airflow dag run faster?</h2>
<p>There are three variables we could control to improve airflow dag performance:</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">parallelism</span></code>: This variable controls the number of task instances that the airflow worker can run simultaneously. User could increase the parallelism variable in the <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>.</li>
<li><code class="docutils literal notranslate"><span class="pre">concurrency</span></code>: The Airflow scheduler will run no more than <code class="docutils literal notranslate"><span class="pre">$concurrency</span></code> task instances for your DAG at any given time. Concurrency is defined in your Airflow DAG. If you do not set the concurrency on your DAG, the scheduler will use the default value from the <code class="docutils literal notranslate"><span class="pre">dag_concurrency</span></code> entry in your <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>.</li>
<li><code class="docutils literal notranslate"><span class="pre">max_active_runs</span></code>: the Airflow scheduler will run no more than <code class="docutils literal notranslate"><span class="pre">max_active_runs</span></code> DagRuns of your DAG at a given time. If you do not set the <code class="docutils literal notranslate"><span class="pre">max_active_runs</span></code> in your DAG, the scheduler will use the default value from the <code class="docutils literal notranslate"><span class="pre">max_active_runs_per_dag</span></code> entry in your <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>.</li>
</ul>
</div>
<div class="section" id="how-can-we-reduce-the-airflow-ui-page-load-time">
<h2 class="sigil_not_in_toc">How can we reduce the airflow UI page load time?</h2>
<p>If your dag takes long time to load, you could reduce the value of <code class="docutils literal notranslate"><span class="pre">default_dag_run_display_number</span></code> configuration in <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> to a smaller value. This configurable controls the number of dag run to show in UI with default value 25.</p>
</div>
<div class="section" id="how-to-fix-exception-global-variable-explicit-defaults-for-timestamp-needs-to-be-on-1">
<h2 class="sigil_not_in_toc">How to fix Exception: Global variable explicit_defaults_for_timestamp needs to be on (1)?</h2>
<p>This means <code class="docutils literal notranslate"><span class="pre">explicit_defaults_for_timestamp</span></code> is disabled in your mysql server and you need to enable it by:</p>
<ol class="arabic simple">
<li>Set <code class="docutils literal notranslate"><span class="pre">explicit_defaults_for_timestamp</span> <span class="pre">=</span> <span class="pre">1</span></code> under the mysqld section in your my.cnf file.</li>
<li>Restart the Mysql server.</li>
</ol>
</div>
<div class="section" id="how-to-reduce-airflow-dag-scheduling-latency-in-production">
<h2 class="sigil_not_in_toc">How to reduce airflow dag scheduling latency in production?</h2>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">max_threads</span></code>: Scheduler will spawn multiple threads in parallel to schedule dags. This is controlled by <code class="docutils literal notranslate"><span class="pre">max_threads</span></code> with default value of 2. User should increase this value to a larger value(e.g numbers of cpus where scheduler runs - 1) in production.</li>
<li><code class="docutils literal notranslate"><span class="pre">scheduler_heartbeat_sec</span></code>: User should consider to increase <code class="docutils literal notranslate"><span class="pre">scheduler_heartbeat_sec</span></code> config to a higher value(e.g 60 secs) which controls how frequent the airflow scheduler gets the heartbeat and updates the job&#x2019;s entry in database.</li>
</ul>
</div>
</body>
</html>
\ No newline at end of file
因为 它太大了无法显示 source diff 。你可以改为 查看blob
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Installation</h1>
<div class="section" id="getting-airflow">
<h2 class="sigil_not_in_toc">Getting Airflow</h2>
<p>The easiest way to install the latest stable version of Airflow is with <code class="docutils literal notranslate"><span class="pre">pip</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip install apache-airflow
</pre>
</div>
</div>
<p>You can also install Airflow with support for extra features like <code class="docutils literal notranslate"><span class="pre">s3</span></code> or <code class="docutils literal notranslate"><span class="pre">postgres</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip install apache-airflow<span class="o">[</span>postgres,s3<span class="o">]</span>
</pre>
</div>
</div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p>GPL dependency</p>
<p class="last">One of the dependencies of Apache Airflow by default pulls in a GPL library (&#x2018;unidecode&#x2019;).
In case this is a concern you can force a non GPL library by issuing
<code class="docutils literal notranslate"><span class="pre">export</span> <span class="pre">SLUGIFY_USES_TEXT_UNIDECODE=yes</span></code> and then proceed with the normal installation.
Please note that this needs to be specified at every upgrade. Also note that if <cite>unidecode</cite>
is already present on the system the dependency will still be used.</p>
</div>
</div>
<div class="section" id="extra-packages">
<h2 class="sigil_not_in_toc">Extra Packages</h2>
<p>The <code class="docutils literal notranslate"><span class="pre">apache-airflow</span></code> PyPI basic package only installs what&#x2019;s needed to get started.
Subpackages can be installed depending on what will be useful in your
environment. For instance, if you don&#x2019;t need connectivity with Postgres,
you won&#x2019;t have to go through the trouble of installing the <code class="docutils literal notranslate"><span class="pre">postgres-devel</span></code>
yum package, or whatever equivalent applies on the distribution you are using.</p>
<p>Behind the scenes, Airflow does conditional imports of operators that require
these extra dependencies.</p>
<p>Here&#x2019;s the list of the subpackages and what they enable:</p>
<table border="1" class="docutils">
<colgroup>
<col width="14%">
<col width="42%">
<col width="45%">
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">subpackage</th>
<th class="head">install command</th>
<th class="head">enables</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>all</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[all]</span></code></td>
<td>All Airflow features known to man</td>
</tr>
<tr class="row-odd"><td>all_dbs</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[all_dbs]</span></code></td>
<td>All databases integrations</td>
</tr>
<tr class="row-even"><td>async</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[async]</span></code></td>
<td>Async worker classes for Gunicorn</td>
</tr>
<tr class="row-odd"><td>celery</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[celery]</span></code></td>
<td>CeleryExecutor</td>
</tr>
<tr class="row-even"><td>cloudant</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[cloudant]</span></code></td>
<td>Cloudant hook</td>
</tr>
<tr class="row-odd"><td>crypto</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[crypto]</span></code></td>
<td>Encrypt connection passwords in metadata db</td>
</tr>
<tr class="row-even"><td>devel</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[devel]</span></code></td>
<td>Minimum dev tools requirements</td>
</tr>
<tr class="row-odd"><td>devel_hadoop</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[devel_hadoop]</span></code></td>
<td>Airflow + dependencies on the Hadoop stack</td>
</tr>
<tr class="row-even"><td>druid</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[druid]</span></code></td>
<td>Druid related operators &amp; hooks</td>
</tr>
<tr class="row-odd"><td>gcp_api</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[gcp_api]</span></code></td>
<td>Google Cloud Platform hooks and operators
(using <code class="docutils literal notranslate"><span class="pre">google-api-python-client</span></code>)</td>
</tr>
<tr class="row-even"><td>hdfs</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[hdfs]</span></code></td>
<td>HDFS hooks and operators</td>
</tr>
<tr class="row-odd"><td>hive</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[hive]</span></code></td>
<td>All Hive related operators</td>
</tr>
<tr class="row-even"><td>jdbc</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[jdbc]</span></code></td>
<td>JDBC hooks and operators</td>
</tr>
<tr class="row-odd"><td>kerbero s</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[kerberos]</span></code></td>
<td>Kerberos integration for Kerberized Hadoop</td>
</tr>
<tr class="row-even"><td>ldap</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[ldap]</span></code></td>
<td>LDAP authentication for users</td>
</tr>
<tr class="row-odd"><td>mssql</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[mssql]</span></code></td>
<td>Microsoft SQL Server operators and hook,
support as an Airflow backend</td>
</tr>
<tr class="row-even"><td>mysql</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[mysql]</span></code></td>
<td>MySQL operators and hook, support as an Airflow
backend. The version of MySQL server has to be
5.6.4+. The exact version upper bound depends
on version of <code class="docutils literal notranslate"><span class="pre">mysqlclient</span></code> package. For
example, <code class="docutils literal notranslate"><span class="pre">mysqlclient</span></code> 1.3.12 can only be
used with MySQL server 5.6.4 through 5.7.</td>
</tr>
<tr class="row-odd"><td>password</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[password]</span></code></td>
<td>Password authentication for users</td>
</tr>
<tr class="row-even"><td>postgres</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[postgres]</span></code></td>
<td>PostgreSQL operators and hook, support as an
Airflow backend</td>
</tr>
<tr class="row-odd"><td>qds</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[qds]</span></code></td>
<td>Enable QDS (Qubole Data Service) support</td>
</tr>
<tr class="row-even"><td>rabbitmq</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[rabbitmq]</span></code></td>
<td>RabbitMQ support as a Celery backend</td>
</tr>
<tr class="row-odd"><td>redis</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[redis]</span></code></td>
<td>Redis hooks and sensors</td>
</tr>
<tr class="row-even"><td>s3</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[s3]</span></code></td>
<td><code class="docutils literal notranslate"><span class="pre">S3KeySensor</span></code>, <code class="docutils literal notranslate"><span class="pre">S3PrefixSensor</span></code></td>
</tr>
<tr class="row-odd"><td>samba</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[samba]</span></code></td>
<td><code class="docutils literal notranslate"><span class="pre">Hive2SambaOperator</span></code></td>
</tr>
<tr class="row-even"><td>slack</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[slack]</span></code></td>
<td><code class="docutils literal notranslate"><span class="pre">SlackAPIPostOperator</span></code></td>
</tr>
<tr class="row-odd"><td>ssh</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[ssh]</span></code></td>
<td>SSH hooks and Operator</td>
</tr>
<tr class="row-even"><td>vertica</td>
<td><code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">apache-airflow[vertica]</span></code></td>
<td>Vertica hook support as an Airflow backend</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="initiating-airflow-database">
<h2 class="sigil_not_in_toc">Initiating Airflow Database</h2>
<p>Airflow requires a database to be initiated before you can run tasks. If
you&#x2019;re just experimenting and learning Airflow, you can stick with the
default SQLite option. If you don&#x2019;t want to use SQLite, then take a look at
<a class="reference internal" href="howto/initialize-database.html"><span class="doc">Initializing a Database Backend</span></a> to setup a different database.</p>
<p>After configuration, you&#x2019;ll need to initialize the database before you can
run tasks:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>airflow initdb
</pre>
</div>
</div>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Tutorial</h1>
<p>This tutorial walks you through some of the fundamental Airflow concepts,
objects, and their usage while writing your first pipeline.</p>
<div class="section" id="example-pipeline-definition">
<h2 class="sigil_not_in_toc">Example Pipeline definition</h2>
<p>Here is an example of a basic pipeline definition. Do not worry if this looks
complicated, a line by line explanation follows below.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd">Code that goes along with the Airflow tutorial located at:</span>
<span class="sd">https://github.com/apache/incubator-airflow/blob/master/airflow/example_dags/tutorial.py</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="kn">from</span> <span class="nn">airflow</span> <span class="k">import</span> <span class="n">DAG</span>
<span class="kn">from</span> <span class="nn">airflow.operators.bash_operator</span> <span class="k">import</span> <span class="n">BashOperator</span>
<span class="kn">from</span> <span class="nn">datetime</span> <span class="k">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span>
<span class="n">default_args</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">&apos;owner&apos;</span><span class="p">:</span> <span class="s1">&apos;airflow&apos;</span><span class="p">,</span>
<span class="s1">&apos;depends_on_past&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">&apos;start_date&apos;</span><span class="p">:</span> <span class="n">datetime</span><span class="p">(</span><span class="mi">2015</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="s1">&apos;email&apos;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&apos;airflow@example.com&apos;</span><span class="p">],</span>
<span class="s1">&apos;email_on_failure&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">&apos;email_on_retry&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">&apos;retries&apos;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">&apos;retry_delay&apos;</span><span class="p">:</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">5</span><span class="p">),</span>
<span class="c1"># &apos;queue&apos;: &apos;bash_queue&apos;,</span>
<span class="c1"># &apos;pool&apos;: &apos;backfill&apos;,</span>
<span class="c1"># &apos;priority_weight&apos;: 10,</span>
<span class="c1"># &apos;end_date&apos;: datetime(2016, 1, 1),</span>
<span class="p">}</span>
<span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="s1">&apos;tutorial&apos;</span><span class="p">,</span> <span class="n">default_args</span><span class="o">=</span><span class="n">default_args</span><span class="p">)</span>
<span class="c1"># t1, t2 and t3 are examples of tasks created by instantiating operators</span>
<span class="n">t1</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;print_date&apos;</span><span class="p">,</span>
<span class="n">bash_command</span><span class="o">=</span><span class="s1">&apos;date&apos;</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">t2</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;sleep&apos;</span><span class="p">,</span>
<span class="n">bash_command</span><span class="o">=</span><span class="s1">&apos;sleep 5&apos;</span><span class="p">,</span>
<span class="n">retries</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">templated_command</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;</span>
<span class="s2"> {</span><span class="si">% f</span><span class="s2">or i in range(5) %}</span>
<span class="s2"> echo &quot;{{ ds }}&quot;</span>
<span class="s2"> echo &quot;{{ macros.ds_add(ds, 7)}}&quot;</span>
<span class="s2"> echo &quot;{{ params.my_param }}&quot;</span>
<span class="s2"> {</span><span class="si">% e</span><span class="s2">ndfor %}</span>
<span class="s2">&quot;&quot;&quot;</span>
<span class="n">t3</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;templated&apos;</span><span class="p">,</span>
<span class="n">bash_command</span><span class="o">=</span><span class="n">templated_command</span><span class="p">,</span>
<span class="n">params</span><span class="o">=</span><span class="p">{</span><span class="s1">&apos;my_param&apos;</span><span class="p">:</span> <span class="s1">&apos;Parameter I passed in&apos;</span><span class="p">},</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">t2</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">(</span><span class="n">t1</span><span class="p">)</span>
<span class="n">t3</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">(</span><span class="n">t1</span><span class="p">)</span>
</pre>
</div>
</div>
</div>
<div class="section" id="it-s-a-dag-definition-file">
<h2 class="sigil_not_in_toc">It&#x2019;s a DAG definition file</h2>
<p>One thing to wrap your head around (it may not be very intuitive for everyone
at first) is that this Airflow Python script is really
just a configuration file specifying the DAG&#x2019;s structure as code.
The actual tasks defined here will run in a different context from
the context of this script. Different tasks run on different workers
at different points in time, which means that this script cannot be used
to cross communicate between tasks. Note that for this
purpose we have a more advanced feature called <code class="docutils literal notranslate"><span class="pre">XCom</span></code>.</p>
<p>People sometimes think of the DAG definition file as a place where they
can do some actual data processing - that is not the case at all!
The script&#x2019;s purpose is to define a DAG object. It needs to evaluate
quickly (seconds, not minutes) since the scheduler will execute it
periodically to reflect the changes if any.</p>
</div>
<div class="section" id="importing-modules">
<h2 class="sigil_not_in_toc">Importing Modules</h2>
<p>An Airflow pipeline is just a Python script that happens to define an
Airflow DAG object. Let&#x2019;s start by importing the libraries we will need.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># The DAG object; we&apos;ll need this to instantiate a DAG</span>
<span class="kn">from</span> <span class="nn">airflow</span> <span class="k">import</span> <span class="n">DAG</span>
<span class="c1"># Operators; we need this to operate!</span>
<span class="kn">from</span> <span class="nn">airflow.operators.bash_operator</span> <span class="k">import</span> <span class="n">BashOperator</span>
</pre>
</div>
</div>
</div>
<div class="section" id="default-arguments">
<h2 class="sigil_not_in_toc">Default Arguments</h2>
<p>We&#x2019;re about to create a DAG and some tasks, and we have the choice to
explicitly pass a set of arguments to each task&#x2019;s constructor
(which would become redundant), or (better!) we can define a dictionary
of default parameters that we can use when creating tasks.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">datetime</span> <span class="k">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span>
<span class="n">default_args</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">&apos;owner&apos;</span><span class="p">:</span> <span class="s1">&apos;airflow&apos;</span><span class="p">,</span>
<span class="s1">&apos;depends_on_past&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">&apos;start_date&apos;</span><span class="p">:</span> <span class="n">datetime</span><span class="p">(</span><span class="mi">2015</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="s1">&apos;email&apos;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&apos;airflow@example.com&apos;</span><span class="p">],</span>
<span class="s1">&apos;email_on_failure&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">&apos;email_on_retry&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">&apos;retries&apos;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">&apos;retry_delay&apos;</span><span class="p">:</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">5</span><span class="p">),</span>
<span class="c1"># &apos;queue&apos;: &apos;bash_queue&apos;,</span>
<span class="c1"># &apos;pool&apos;: &apos;backfill&apos;,</span>
<span class="c1"># &apos;priority_weight&apos;: 10,</span>
<span class="c1"># &apos;end_date&apos;: datetime(2016, 1, 1),</span>
<span class="p">}</span>
</pre>
</div>
</div>
<p>For more information about the BaseOperator&#x2019;s parameters and what they do,
refer to the <a class="reference internal" href="code.html#airflow.models.BaseOperator" title="airflow.models.BaseOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">airflow.models.BaseOperator</span></code></a> documentation.</p>
<p>Also, note that you could easily define different sets of arguments that
would serve different purposes. An example of that would be to have
different settings between a production and development environment.</p>
</div>
<div class="section" id="instantiate-a-dag">
<h2 class="sigil_not_in_toc">Instantiate a DAG</h2>
<p>We&#x2019;ll need a DAG object to nest our tasks into. Here we pass a string
that defines the <code class="docutils literal notranslate"><span class="pre">dag_id</span></code>, which serves as a unique identifier for your DAG.
We also pass the default argument dictionary that we just defined and
define a <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code> of 1 day for the DAG.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span>
<span class="s1">&apos;tutorial&apos;</span><span class="p">,</span> <span class="n">default_args</span><span class="o">=</span><span class="n">default_args</span><span class="p">,</span> <span class="n">schedule_interval</span><span class="o">=</span><span class="n">timedelta</span><span class="p">(</span><span class="mi">1</span><span class="p">))</span>
</pre>
</div>
</div>
</div>
<div class="section" id="tasks">
<h2 class="sigil_not_in_toc">Tasks</h2>
<p>Tasks are generated when instantiating operator objects. An object
instantiated from an operator is called a constructor. The first argument
<code class="docutils literal notranslate"><span class="pre">task_id</span></code> acts as a unique identifier for the task.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">t1</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;print_date&apos;</span><span class="p">,</span>
<span class="n">bash_command</span><span class="o">=</span><span class="s1">&apos;date&apos;</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">t2</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;sleep&apos;</span><span class="p">,</span>
<span class="n">bash_command</span><span class="o">=</span><span class="s1">&apos;sleep 5&apos;</span><span class="p">,</span>
<span class="n">retries</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
</pre>
</div>
</div>
<p>Notice how we pass a mix of operator specific arguments (<code class="docutils literal notranslate"><span class="pre">bash_command</span></code>) and
an argument common to all operators (<code class="docutils literal notranslate"><span class="pre">retries</span></code>) inherited
from BaseOperator to the operator&#x2019;s constructor. This is simpler than
passing every argument for every constructor call. Also, notice that in
the second task we override the <code class="docutils literal notranslate"><span class="pre">retries</span></code> parameter with <code class="docutils literal notranslate"><span class="pre">3</span></code>.</p>
<p>The precedence rules for a task are as follows:</p>
<ol class="arabic simple">
<li>Explicitly passed arguments</li>
<li>Values that exist in the <code class="docutils literal notranslate"><span class="pre">default_args</span></code> dictionary</li>
<li>The operator&#x2019;s default value, if one exists</li>
</ol>
<p>A task must include or inherit the arguments <code class="docutils literal notranslate"><span class="pre">task_id</span></code> and <code class="docutils literal notranslate"><span class="pre">owner</span></code>,
otherwise Airflow will raise an exception.</p>
</div>
<div class="section" id="templating-with-jinja">
<h2 class="sigil_not_in_toc">Templating with Jinja</h2>
<p>Airflow leverages the power of
<a class="reference external" href="http://jinja.pocoo.org/docs/dev/">Jinja Templating</a> and provides
the pipeline author
with a set of built-in parameters and macros. Airflow also provides
hooks for the pipeline author to define their own parameters, macros and
templates.</p>
<p>This tutorial barely scratches the surface of what you can do with
templating in Airflow, but the goal of this section is to let you know
this feature exists, get you familiar with double curly brackets, and
point to the most common template variable: <code class="docutils literal notranslate"><span class="pre">{{</span> <span class="pre">ds</span> <span class="pre">}}</span></code> (today&#x2019;s &#x201C;date
stamp&#x201D;).</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">templated_command</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;</span>
<span class="s2"> {</span><span class="si">% f</span><span class="s2">or i in range(5) %}</span>
<span class="s2"> echo &quot;{{ ds }}&quot;</span>
<span class="s2"> echo &quot;{{ macros.ds_add(ds, 7) }}&quot;</span>
<span class="s2"> echo &quot;{{ params.my_param }}&quot;</span>
<span class="s2"> {</span><span class="si">% e</span><span class="s2">ndfor %}</span>
<span class="s2">&quot;&quot;&quot;</span>
<span class="n">t3</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;templated&apos;</span><span class="p">,</span>
<span class="n">bash_command</span><span class="o">=</span><span class="n">templated_command</span><span class="p">,</span>
<span class="n">params</span><span class="o">=</span><span class="p">{</span><span class="s1">&apos;my_param&apos;</span><span class="p">:</span> <span class="s1">&apos;Parameter I passed in&apos;</span><span class="p">},</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
</pre>
</div>
</div>
<p>Notice that the <code class="docutils literal notranslate"><span class="pre">templated_command</span></code> contains code logic in <code class="docutils literal notranslate"><span class="pre">{%</span> <span class="pre">%}</span></code> blocks,
references parameters like <code class="docutils literal notranslate"><span class="pre">{{</span> <span class="pre">ds</span> <span class="pre">}}</span></code>, calls a function as in
<code class="docutils literal notranslate"><span class="pre">{{</span> <span class="pre">macros.ds_add(ds,</span> <span class="pre">7)}}</span></code>, and references a user-defined parameter
in <code class="docutils literal notranslate"><span class="pre">{{</span> <span class="pre">params.my_param</span> <span class="pre">}}</span></code>.</p>
<p>The <code class="docutils literal notranslate"><span class="pre">params</span></code> hook in <code class="docutils literal notranslate"><span class="pre">BaseOperator</span></code> allows you to pass a dictionary of
parameters and/or objects to your templates. Please take the time
to understand how the parameter <code class="docutils literal notranslate"><span class="pre">my_param</span></code> makes it through to the template.</p>
<p>Files can also be passed to the <code class="docutils literal notranslate"><span class="pre">bash_command</span></code> argument, like
<code class="docutils literal notranslate"><span class="pre">bash_command=&apos;templated_command.sh&apos;</span></code>, where the file location is relative to
the directory containing the pipeline file (<code class="docutils literal notranslate"><span class="pre">tutorial.py</span></code> in this case). This
may be desirable for many reasons, like separating your script&#x2019;s logic and
pipeline code, allowing for proper code highlighting in files composed in
different languages, and general flexibility in structuring pipelines. It is
also possible to define your <code class="docutils literal notranslate"><span class="pre">template_searchpath</span></code> as pointing to any folder
locations in the DAG constructor call.</p>
<p>Using that same DAG constructor call, it is possible to define
<code class="docutils literal notranslate"><span class="pre">user_defined_macros</span></code> which allow you to specify your own variables.
For example, passing <code class="docutils literal notranslate"><span class="pre">dict(foo=&apos;bar&apos;)</span></code> to this argument allows you
to use <code class="docutils literal notranslate"><span class="pre">{{</span> <span class="pre">foo</span> <span class="pre">}}</span></code> in your templates. Moreover, specifying
<code class="docutils literal notranslate"><span class="pre">user_defined_filters</span></code> allow you to register you own filters. For example,
passing <code class="docutils literal notranslate"><span class="pre">dict(hello=lambda</span> <span class="pre">name:</span> <span class="pre">&apos;Hello</span> <span class="pre">%s&apos;</span> <span class="pre">%</span> <span class="pre">name)</span></code> to this argument allows
you to use <code class="docutils literal notranslate"><span class="pre">{{</span> <span class="pre">&apos;world&apos;</span> <span class="pre">|</span> <span class="pre">hello</span> <span class="pre">}}</span></code> in your templates. For more information
regarding custom filters have a look at the
<a class="reference external" href="http://jinja.pocoo.org/docs/dev/api/#writing-filters">Jinja Documentation</a></p>
<p>For more information on the variables and macros that can be referenced
in templates, make sure to read through the <a class="reference internal" href="code.html#macros"><span class="std std-ref">Macros</span></a> section</p>
</div>
<div class="section" id="setting-up-dependencies">
<h2 class="sigil_not_in_toc">Setting up Dependencies</h2>
<p>We have two simple tasks that do not depend on each other. Here&#x2019;s a few ways
you can define dependencies between them:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">t2</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">(</span><span class="n">t1</span><span class="p">)</span>
<span class="c1"># This means that t2 will depend on t1</span>
<span class="c1"># running successfully to run</span>
<span class="c1"># It is equivalent to</span>
<span class="c1"># t1.set_downstream(t2)</span>
<span class="n">t3</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">(</span><span class="n">t1</span><span class="p">)</span>
<span class="c1"># all of this is equivalent to</span>
<span class="c1"># dag.set_dependency(&apos;print_date&apos;, &apos;sleep&apos;)</span>
<span class="c1"># dag.set_dependency(&apos;print_date&apos;, &apos;templated&apos;)</span>
</pre>
</div>
</div>
<p>Note that when executing your script, Airflow will raise exceptions when
it finds cycles in your DAG or when a dependency is referenced more
than once.</p>
</div>
<div class="section" id="recap">
<h2 class="sigil_not_in_toc">Recap</h2>
<p>Alright, so we have a pretty basic DAG. At this point your code should look
something like this:</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd">Code that goes along with the Airflow located at:</span>
<span class="sd">http://airflow.readthedocs.org/en/latest/tutorial.html</span>
<span class="sd">&quot;&quot;&quot;</span>
<span class="kn">from</span> <span class="nn">airflow</span> <span class="k">import</span> <span class="n">DAG</span>
<span class="kn">from</span> <span class="nn">airflow.operators.bash_operator</span> <span class="k">import</span> <span class="n">BashOperator</span>
<span class="kn">from</span> <span class="nn">datetime</span> <span class="k">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span>
<span class="n">default_args</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">&apos;owner&apos;</span><span class="p">:</span> <span class="s1">&apos;airflow&apos;</span><span class="p">,</span>
<span class="s1">&apos;depends_on_past&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">&apos;start_date&apos;</span><span class="p">:</span> <span class="n">datetime</span><span class="p">(</span><span class="mi">2015</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="s1">&apos;email&apos;</span><span class="p">:</span> <span class="p">[</span><span class="s1">&apos;airflow@example.com&apos;</span><span class="p">],</span>
<span class="s1">&apos;email_on_failure&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">&apos;email_on_retry&apos;</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">&apos;retries&apos;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s1">&apos;retry_delay&apos;</span><span class="p">:</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">5</span><span class="p">),</span>
<span class="c1"># &apos;queue&apos;: &apos;bash_queue&apos;,</span>
<span class="c1"># &apos;pool&apos;: &apos;backfill&apos;,</span>
<span class="c1"># &apos;priority_weight&apos;: 10,</span>
<span class="c1"># &apos;end_date&apos;: datetime(2016, 1, 1),</span>
<span class="p">}</span>
<span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span>
<span class="s1">&apos;tutorial&apos;</span><span class="p">,</span> <span class="n">default_args</span><span class="o">=</span><span class="n">default_args</span><span class="p">,</span> <span class="n">schedule_interval</span><span class="o">=</span><span class="n">timedelta</span><span class="p">(</span><span class="mi">1</span><span class="p">))</span>
<span class="c1"># t1, t2 and t3 are examples of tasks created by instantiating operators</span>
<span class="n">t1</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;print_date&apos;</span><span class="p">,</span>
<span class="n">bash_command</span><span class="o">=</span><span class="s1">&apos;date&apos;</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">t2</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;sleep&apos;</span><span class="p">,</span>
<span class="n">bash_command</span><span class="o">=</span><span class="s1">&apos;sleep 5&apos;</span><span class="p">,</span>
<span class="n">retries</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">templated_command</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;</span>
<span class="s2"> {</span><span class="si">% f</span><span class="s2">or i in range(5) %}</span>
<span class="s2"> echo &quot;{{ ds }}&quot;</span>
<span class="s2"> echo &quot;{{ macros.ds_add(ds, 7)}}&quot;</span>
<span class="s2"> echo &quot;{{ params.my_param }}&quot;</span>
<span class="s2"> {</span><span class="si">% e</span><span class="s2">ndfor %}</span>
<span class="s2">&quot;&quot;&quot;</span>
<span class="n">t3</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;templated&apos;</span><span class="p">,</span>
<span class="n">bash_command</span><span class="o">=</span><span class="n">templated_command</span><span class="p">,</span>
<span class="n">params</span><span class="o">=</span><span class="p">{</span><span class="s1">&apos;my_param&apos;</span><span class="p">:</span> <span class="s1">&apos;Parameter I passed in&apos;</span><span class="p">},</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">t2</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">(</span><span class="n">t1</span><span class="p">)</span>
<span class="n">t3</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">(</span><span class="n">t1</span><span class="p">)</span>
</pre>
</div>
</div>
</div>
<div class="section" id="testing">
<h2 class="sigil_not_in_toc">Testing</h2>
<div class="section" id="running-the-script">
<h3 class="sigil_not_in_toc">Running the Script</h3>
<p>Time to run some tests. First let&#x2019;s make sure that the pipeline
parses. Let&#x2019;s assume we&#x2019;re saving the code from the previous step in
<code class="docutils literal notranslate"><span class="pre">tutorial.py</span></code> in the DAGs folder referenced in your <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>.
The default location for your DAGs is <code class="docutils literal notranslate"><span class="pre">~/airflow/dags</span></code>.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python ~/airflow/dags/tutorial.py
</pre>
</div>
</div>
<p>If the script does not raise an exception it means that you haven&#x2019;t done
anything horribly wrong, and that your Airflow environment is somewhat
sound.</p>
</div>
<div class="section" id="command-line-metadata-validation">
<h3 class="sigil_not_in_toc">Command Line Metadata Validation</h3>
<p>Let&#x2019;s run a few commands to validate this script further.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># print the list of active DAGs</span>
airflow list_dags
<span class="c1"># prints the list of tasks the &quot;tutorial&quot; dag_id</span>
airflow list_tasks tutorial
<span class="c1"># prints the hierarchy of tasks in the tutorial DAG</span>
airflow list_tasks tutorial --tree
</pre>
</div>
</div>
</div>
<div class="section" id="id1">
<h3 class="sigil_not_in_toc">Testing</h3>
<p>Let&#x2019;s test by running the actual task instances on a specific date. The
date specified in this context is an <code class="docutils literal notranslate"><span class="pre">execution_date</span></code>, which simulates the
scheduler running your task or dag at a specific date + time:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># command layout: command subcommand dag_id task_id date</span>
<span class="c1"># testing print_date</span>
airflow <span class="nb">test</span> tutorial print_date <span class="m">2015</span>-06-01
<span class="c1"># testing sleep</span>
airflow <span class="nb">test</span> tutorial sleep <span class="m">2015</span>-06-01
</pre>
</div>
</div>
<p>Now remember what we did with templating earlier? See how this template
gets rendered and executed by running this command:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># testing templated</span>
airflow <span class="nb">test</span> tutorial templated <span class="m">2015</span>-06-01
</pre>
</div>
</div>
<p>This should result in displaying a verbose log of events and ultimately
running your bash command and printing the result.</p>
<p>Note that the <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">test</span></code> command runs task instances locally, outputs
their log to stdout (on screen), doesn&#x2019;t bother with dependencies, and
doesn&#x2019;t communicate state (running, success, failed, &#x2026;) to the database.
It simply allows testing a single task instance.</p>
</div>
<div class="section" id="backfill">
<h3 class="sigil_not_in_toc">Backfill</h3>
<p>Everything looks like it&#x2019;s running fine so let&#x2019;s run a backfill.
<code class="docutils literal notranslate"><span class="pre">backfill</span></code> will respect your dependencies, emit logs into files and talk to
the database to record status. If you do have a webserver up, you&#x2019;ll be able
to track the progress. <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">webserver</span></code> will start a web server if you
are interested in tracking the progress visually as your backfill progresses.</p>
<p>Note that if you use <code class="docutils literal notranslate"><span class="pre">depends_on_past=True</span></code>, individual task instances
will depend on the success of the preceding task instance, except for the
start_date specified itself, for which this dependency is disregarded.</p>
<p>The date range in this context is a <code class="docutils literal notranslate"><span class="pre">start_date</span></code> and optionally an <code class="docutils literal notranslate"><span class="pre">end_date</span></code>,
which are used to populate the run schedule with task instances from this dag.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># optional, start a web server in debug mode in the background</span>
<span class="c1"># airflow webserver --debug &amp;</span>
<span class="c1"># start your backfill on a date range</span>
airflow backfill tutorial -s <span class="m">2015</span>-06-01 -e <span class="m">2015</span>-06-07
</pre>
</div>
</div>
</div>
</div>
<div class="section" id="what-s-next">
<h2 class="sigil_not_in_toc">What&#x2019;s Next?</h2>
<p>That&#x2019;s it, you&#x2019;ve written, tested and backfilled your very first Airflow
pipeline. Merging your code into a code repository that has a master scheduler
running against it should get it to get triggered and run every day.</p>
<p>Here&#x2019;s a few things you might want to do next:</p>
<ul>
<li><p class="first">Take an in-depth tour of the UI - click all the things!</p>
</li>
<li><p class="first">Keep reading the docs! Especially the sections on:</p>
<blockquote>
<div><ul class="simple">
<li>Command line interface</li>
<li>Operators</li>
<li>Macros</li>
</ul>
</div>
</blockquote>
</li>
<li><p class="first">Write your first pipeline!</p>
</li>
</ul>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>How-to Guides</h1>
<p>Setting up the sandbox in the <a class="reference internal" href="../start.html"><span class="doc">Quick Start</span></a> section was easy;
building a production-grade environment requires a bit more work!</p>
<p>These how-to guides will step you through common tasks in using and
configuring an Airflow environment.</p>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="set-config.html">Setting Configuration Options</a></li>
<li class="toctree-l1"><a class="reference internal" href="initialize-database.html">Initializing a Database Backend</a></li>
<li class="toctree-l1"><a class="reference internal" href="operator.html">Using Operators</a><ul>
<li class="toctree-l2"><a class="reference internal" href="operator.html#bashoperator">BashOperator</a></li>
<li class="toctree-l2"><a class="reference internal" href="operator.html#pythonoperator">PythonOperator</a></li>
<li class="toctree-l2"><a class="reference internal" href="operator.html#google-cloud-platform-operators">Google Cloud Platform Operators</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="manage-connections.html">Managing Connections</a><ul>
<li class="toctree-l2"><a class="reference internal" href="manage-connections.html#creating-a-connection-with-the-ui">Creating a Connection with the UI</a></li>
<li class="toctree-l2"><a class="reference internal" href="manage-connections.html#editing-a-connection-with-the-ui">Editing a Connection with the UI</a></li>
<li class="toctree-l2"><a class="reference internal" href="manage-connections.html#creating-a-connection-with-environment-variables">Creating a Connection with Environment Variables</a></li>
<li class="toctree-l2"><a class="reference internal" href="manage-connections.html#connection-types">Connection Types</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="secure-connections.html">Securing Connections</a></li>
<li class="toctree-l1"><a class="reference internal" href="write-logs.html">Writing Logs</a><ul>
<li class="toctree-l2"><a class="reference internal" href="write-logs.html#writing-logs-locally">Writing Logs Locally</a></li>
<li class="toctree-l2"><a class="reference internal" href="write-logs.html#writing-logs-to-amazon-s3">Writing Logs to Amazon S3</a></li>
<li class="toctree-l2"><a class="reference internal" href="write-logs.html#writing-logs-to-azure-blob-storage">Writing Logs to Azure Blob Storage</a></li>
<li class="toctree-l2"><a class="reference internal" href="write-logs.html#writing-logs-to-google-cloud-storage">Writing Logs to Google Cloud Storage</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="executor/use-celery.html">Scaling Out with Celery</a></li>
<li class="toctree-l1"><a class="reference internal" href="executor/use-dask.html">Scaling Out with Dask</a></li>
<li class="toctree-l1"><a class="reference internal" href="executor/use-mesos.html">Scaling Out with Mesos (community contributed)</a><ul>
<li class="toctree-l2"><a class="reference internal" href="executor/use-mesos.html#tasks-executed-directly-on-mesos-slaves">Tasks executed directly on mesos slaves</a></li>
<li class="toctree-l2"><a class="reference internal" href="executor/use-mesos.html#tasks-executed-in-containers-on-mesos-slaves">Tasks executed in containers on mesos slaves</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="run-with-systemd.html">Running Airflow with systemd</a></li>
<li class="toctree-l1"><a class="reference internal" href="run-with-upstart.html">Running Airflow with upstart</a></li>
<li class="toctree-l1"><a class="reference internal" href="use-test-config.html">Using the Test Mode Configuration</a></li>
</ul>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Setting Configuration Options</h1>
<p>The first time you run Airflow, it will create a file called <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> in
your <code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME</span></code> directory (<code class="docutils literal notranslate"><span class="pre">~/airflow</span></code> by default). This file contains Airflow&#x2019;s configuration and you
can edit it to change any of the settings. You can also set options with environment variables by using this format:
<code class="docutils literal notranslate"><span class="pre">$AIRFLOW__{SECTION}__{KEY}</span></code> (note the double underscores).</p>
<p>For example, the
metadata database connection string can either be set in <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> like this:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>core<span class="o">]</span>
<span class="nv">sql_alchemy_conn</span> <span class="o">=</span> my_conn_string
</pre>
</div>
</div>
<p>or by creating a corresponding environment variable:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">AIRFLOW__CORE__SQL_ALCHEMY_CONN</span><span class="o">=</span>my_conn_string
</pre>
</div>
</div>
<p>You can also derive the connection string at run time by appending <code class="docutils literal notranslate"><span class="pre">_cmd</span></code> to the key like this:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="o">[</span>core<span class="o">]</span>
<span class="nv">sql_alchemy_conn_cmd</span> <span class="o">=</span> bash_command_to_run
</pre>
</div>
</div>
<p>-But only three such configuration elements namely sql_alchemy_conn, broker_url and result_backend can be fetched as a command. The idea behind this is to not store passwords on boxes in plain text files. The order of precedence is as follows -</p>
<ol class="arabic simple">
<li>environment variable</li>
<li>configuration in airflow.cfg</li>
<li>command in airflow.cfg</li>
<li>default</li>
</ol>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Initializing a Database Backend</h1>
<p>If you want to take a real test drive of Airflow, you should consider
setting up a real database backend and switching to the LocalExecutor.</p>
<p>As Airflow was built to interact with its metadata using the great SqlAlchemy
library, you should be able to use any database backend supported as a
SqlAlchemy backend. We recommend using <strong>MySQL</strong> or <strong>Postgres</strong>.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">We rely on more strict ANSI SQL settings for MySQL in order to have
sane defaults. Make sure to have specified <cite>explicit_defaults_for_timestamp=1</cite>
in your my.cnf under <cite>[mysqld]</cite></p>
</div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If you decide to use <strong>Postgres</strong>, we recommend using the <code class="docutils literal notranslate"><span class="pre">psycopg2</span></code>
driver and specifying it in your SqlAlchemy connection string.
Also note that since SqlAlchemy does not expose a way to target a
specific schema in the Postgres connection URI, you may
want to set a default schema for your role with a
command similar to <code class="docutils literal notranslate"><span class="pre">ALTER</span> <span class="pre">ROLE</span> <span class="pre">username</span> <span class="pre">SET</span> <span class="pre">search_path</span> <span class="pre">=</span> <span class="pre">airflow,</span> <span class="pre">foobar;</span></code></p>
</div>
<p>Once you&#x2019;ve setup your database to host Airflow, you&#x2019;ll need to alter the
SqlAlchemy connection string located in your configuration file
<code class="docutils literal notranslate"><span class="pre">$AIRFLOW_HOME/airflow.cfg</span></code>. You should then also change the &#x201C;executor&#x201D;
setting to use &#x201C;LocalExecutor&#x201D;, an executor that can parallelize task
instances locally.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># initialize the database</span>
airflow initdb
</pre>
</div>
</div>
</body>
</html>
\ No newline at end of file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>Using Operators</h1>
<p>An operator represents a single, ideally idempotent, task. Operators
determine what actually executes when your DAG runs.</p>
<p>See the <a class="reference internal" href="../concepts.html#concepts-operators"><span class="std std-ref">Operators Concepts</span></a> documentation and the
<a class="reference internal" href="../code.html#api-reference-operators"><span class="std std-ref">Operators API Reference</span></a> for more
information.</p>
<div class="contents local topic" id="contents">
<ul class="simple">
<li><a class="reference internal" href="#bashoperator" id="id2">BashOperator</a><ul>
<li><a class="reference internal" href="#templating" id="id3">Templating</a></li>
<li><a class="reference internal" href="#troubleshooting" id="id4">Troubleshooting</a><ul>
<li><a class="reference internal" href="#jinja-template-not-found" id="id5">Jinja template not found</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#pythonoperator" id="id6">PythonOperator</a><ul>
<li><a class="reference internal" href="#passing-in-arguments" id="id7">Passing in arguments</a></li>
<li><a class="reference internal" href="#id1" id="id8">Templating</a></li>
</ul>
</li>
<li><a class="reference internal" href="#google-cloud-platform-operators" id="id9">Google Cloud Platform Operators</a><ul>
<li><a class="reference internal" href="#googlecloudstoragetobigqueryoperator" id="id10">GoogleCloudStorageToBigQueryOperator</a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="bashoperator">
<h2 class="sigil_not_in_toc"><a class="toc-backref" href="#id2">BashOperator</a></h2>
<p>Use the <a class="reference internal" href="../code.html#airflow.operators.bash_operator.BashOperator" title="airflow.operators.bash_operator.BashOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">BashOperator</span></code></a> to execute
commands in a <a class="reference external" href="https://www.gnu.org/software/bash/">Bash</a> shell.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">run_this</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;run_after_loop&apos;</span><span class="p">,</span> <span class="n">bash_command</span><span class="o">=</span><span class="s1">&apos;echo 1&apos;</span><span class="p">,</span> <span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
</pre>
</div>
</div>
<div class="section" id="templating">
<h3 class="sigil_not_in_toc"><a class="toc-backref" href="#id3">Templating</a></h3>
<p>You can use <a class="reference internal" href="../concepts.html#jinja-templating"><span class="std std-ref">Jinja templates</span></a> to parameterize the
<code class="docutils literal notranslate"><span class="pre">bash_command</span></code> argument.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">task</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;also_run_this&apos;</span><span class="p">,</span>
<span class="n">bash_command</span><span class="o">=</span><span class="s1">&apos;echo &quot;run_id={{ run_id }} | dag_run={{ dag_run }}&quot;&apos;</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
</pre>
</div>
</div>
</div>
<div class="section" id="troubleshooting">
<h3 class="sigil_not_in_toc"><a class="toc-backref" href="#id4">Troubleshooting</a></h3>
<div class="section" id="jinja-template-not-found">
<h4 class="sigil_not_in_toc"><a class="toc-backref" href="#id5">Jinja template not found</a></h4>
<p>Add a space after the script name when directly calling a Bash script with
the <code class="docutils literal notranslate"><span class="pre">bash_command</span></code> argument. This is because Airflow tries to apply a Jinja
template to it, which will fail.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">t2</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;bash_example&apos;</span><span class="p">,</span>
<span class="c1"># This fails with `Jinja template not found` error</span>
<span class="c1"># bash_command=&quot;/home/batcher/test.sh&quot;,</span>
<span class="c1"># This works (has a space after)</span>
<span class="n">bash_command</span><span class="o">=</span><span class="s2">&quot;/home/batcher/test.sh &quot;</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="section" id="pythonoperator">
<h2 class="sigil_not_in_toc"><a class="toc-backref" href="#id6">PythonOperator</a></h2>
<p>Use the <a class="reference internal" href="../code.html#airflow.operators.python_operator.PythonOperator" title="airflow.operators.python_operator.PythonOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">PythonOperator</span></code></a> to execute
Python callables.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">print_context</span><span class="p">(</span><span class="n">ds</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">pprint</span><span class="p">(</span><span class="n">kwargs</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">ds</span><span class="p">)</span>
<span class="k">return</span> <span class="s1">&apos;Whatever you return gets printed in the logs&apos;</span>
<span class="n">run_this</span> <span class="o">=</span> <span class="n">PythonOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;print_the_context&apos;</span><span class="p">,</span>
<span class="n">provide_context</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">python_callable</span><span class="o">=</span><span class="n">print_context</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
</pre>
</div>
</div>
<div class="section" id="passing-in-arguments">
<h3 class="sigil_not_in_toc"><a class="toc-backref" href="#id7">Passing in arguments</a></h3>
<p>Use the <code class="docutils literal notranslate"><span class="pre">op_args</span></code> and <code class="docutils literal notranslate"><span class="pre">op_kwargs</span></code> arguments to pass additional arguments
to the Python callable.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">my_sleeping_function</span><span class="p">(</span><span class="n">random_base</span><span class="p">):</span>
<span class="sd">&quot;&quot;&quot;This is a function that will run within the DAG execution&quot;&quot;&quot;</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">random_base</span><span class="p">)</span>
<span class="c1"># Generate 10 sleeping tasks, sleeping from 0 to 4 seconds respectively</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
<span class="n">task</span> <span class="o">=</span> <span class="n">PythonOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;sleep_for_&apos;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">),</span>
<span class="n">python_callable</span><span class="o">=</span><span class="n">my_sleeping_function</span><span class="p">,</span>
<span class="n">op_kwargs</span><span class="o">=</span><span class="p">{</span><span class="s1">&apos;random_base&apos;</span><span class="p">:</span> <span class="nb">float</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="o">/</span> <span class="mi">10</span><span class="p">},</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
<span class="n">task</span><span class="o">.</span><span class="n">set_upstream</span><span class="p">(</span><span class="n">run_this</span><span class="p">)</span>
</pre>
</div>
</div>
</div>
<div class="section" id="id1">
<h3 class="sigil_not_in_toc"><a class="toc-backref" href="#id8">Templating</a></h3>
<p>When you set the <code class="docutils literal notranslate"><span class="pre">provide_context</span></code> argument to <code class="docutils literal notranslate"><span class="pre">True</span></code>, Airflow passes in
an additional set of keyword arguments: one for each of the <a class="reference internal" href="../code.html#macros"><span class="std std-ref">Jinja
template variables</span></a> and a <code class="docutils literal notranslate"><span class="pre">templates_dict</span></code> argument.</p>
<p>The <code class="docutils literal notranslate"><span class="pre">templates_dict</span></code> argument is templated, so each value in the dictionary
is evaluated as a <a class="reference internal" href="../concepts.html#jinja-templating"><span class="std std-ref">Jinja template</span></a>.</p>
</div>
</div>
<div class="section" id="google-cloud-platform-operators">
<h2 class="sigil_not_in_toc"><a class="toc-backref" href="#id9">Google Cloud Platform Operators</a></h2>
<div class="section" id="googlecloudstoragetobigqueryoperator">
<h3 class="sigil_not_in_toc"><a class="toc-backref" href="#id10">GoogleCloudStorageToBigQueryOperator</a></h3>
<p>Use the
<a class="reference internal" href="../integration.html#airflow.contrib.operators.gcs_to_bq.GoogleCloudStorageToBigQueryOperator" title="airflow.contrib.operators.gcs_to_bq.GoogleCloudStorageToBigQueryOperator"><code class="xref py py-class docutils literal notranslate"><span class="pre">GoogleCloudStorageToBigQueryOperator</span></code></a>
to execute a BigQuery load job.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">load_csv</span> <span class="o">=</span> <span class="n">gcs_to_bq</span><span class="o">.</span><span class="n">GoogleCloudStorageToBigQueryOperator</span><span class="p">(</span>
<span class="n">task_id</span><span class="o">=</span><span class="s1">&apos;gcs_to_bq_example&apos;</span><span class="p">,</span>
<span class="n">bucket</span><span class="o">=</span><span class="s1">&apos;cloud-samples-data&apos;</span><span class="p">,</span>
<span class="n">source_objects</span><span class="o">=</span><span class="p">[</span><span class="s1">&apos;bigquery/us-states/us-states.csv&apos;</span><span class="p">],</span>
<span class="n">destination_project_dataset_table</span><span class="o">=</span><span class="s1">&apos;airflow_test.gcs_to_bq_table&apos;</span><span class="p">,</span>
<span class="n">schema_fields</span><span class="o">=</span><span class="p">[</span>
<span class="p">{</span><span class="s1">&apos;name&apos;</span><span class="p">:</span> <span class="s1">&apos;name&apos;</span><span class="p">,</span> <span class="s1">&apos;type&apos;</span><span class="p">:</span> <span class="s1">&apos;STRING&apos;</span><span class="p">,</span> <span class="s1">&apos;mode&apos;</span><span class="p">:</span> <span class="s1">&apos;NULLABLE&apos;</span><span class="p">},</span>
<span class="p">{</span><span class="s1">&apos;name&apos;</span><span class="p">:</span> <span class="s1">&apos;post_abbr&apos;</span><span class="p">,</span> <span class="s1">&apos;type&apos;</span><span class="p">:</span> <span class="s1">&apos;STRING&apos;</span><span class="p">,</span> <span class="s1">&apos;mode&apos;</span><span class="p">:</span> <span class="s1">&apos;NULLABLE&apos;</span><span class="p">},</span>
<span class="p">],</span>
<span class="n">write_disposition</span><span class="o">=</span><span class="s1">&apos;WRITE_TRUNCATE&apos;</span><span class="p">,</span>
<span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">)</span>
</pre>
</div>
</div>
</div>
</div>
</body>
</html>
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册