<td>Append mode uses watermark to drop old aggregation state. But the output of a windowed aggregation is delayed the late threshold specified in `withWatermark()` as by the modes semantics, rows can be added to the Result Table only once after they are finalized (i.e. after watermark is crossed). See the <a href="http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking">Late Data</a> section for more details.<br /><br />Update mode uses watermark to drop old aggregation state.<br /><br />Complete mode does not drop old aggregation state since by definition this mode preserves all data in the Result Table.</td>
</tr>
<tr>
<td>Other aggregations</td>
<td>Complete, Update</td>
<td>Since no watermark is defined (only defined in other category), old aggregation state is not dropped.<br /><br />Append mode is not supported as aggregates can update thus violating the semantics of this mode.</td>
<td>Update and Complete mode not supported yet. See the <a href="http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#support-matrix-for-joins-in-streaming-queries">support matrix in the Join Operations section</a> for more details on what types of joins are supported.</td>
</tr>
<tr>
<td colspan="2">Other queries</td>
<td>Append, Update</td>
<td>Complete mode not supported as it is infeasible to keep all unaggregated data in the Result Table.</td>
</tr>
</tbody>
</table>
### `Output Sink`
有几种类型的 `build-in` 输出 `sink`。
- **File sink**: 将输出保存到一个目录中
```scala
writeStream
.format("parquet") // can be "orc", "json", "csv", etc.
<td><code>path</code>: path to the output directory, must be specified.<br/><br/>For file-format-specific options, see the related methods in DataFrameWriter (<ahref="http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter">Scala</a>/<ahref="http://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameWriter.html">Java</a>/<ahref="http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter">Python</a>/<ahref="http://spark.apache.org/docs/latest/api/R/write.stream.html">R</a>). E.g. for "parquet" format options see <code>DataFrameWriter.parquet()</code></td>
<td>Yes (exactly-once)</td>
<td>Supports writes to partitioned tables. Partitioning by time may be useful.</td>
<td>More details in the <ahref="structured-streaming-kafka-integration-zh.md">Kafka Integration Guide</a></td>
</tr>
<tr>
<td><strong>Foreach Sink</strong></td>
<td>Append, Update, Complete</td>
<td>None</td>
<td>Yes (at-least-once)</td>
<td>More details in the <ahref="#使用 `Foreach` 和 `ForeachBatch`">next section</a></td>
</tr>
<tr>
<td><strong>ForeachBatch Sink</strong></td>
<td>Append, Update, Complete</td>
<td>None</td>
<td>Depends on the implementation</td>
<td>More details in the <ahref="#使用 `Foreach` 和 `ForeachBatch`">next section</a></td>
</tr>
<tr>
<td><strong>Console Sink</strong></td>
<td>Append, Update, Complete</td>
<td><code>numRows</code>: Number of rows to print every trigger (default: 20)<br/><code>truncate</code>: Whether to truncate the output if too long (default: true)</td>
<td>No</td>
<td> </td>
</tr>
<tr>
<td><strong>Memory Sink</strong></td>
<td>Append, Complete</td>
<td>None</td>
<td>No. But in Complete Mode, restarted query will recreate the full table.</td>
<td>If no trigger setting is explicitly specified, then by default, the query will be executed in micro-batch mode, where micro-batches will be generated as soon as the previous micro-batch has completed processing.</td>
<td>The query will be executed with micro-batches mode, where micro-batches will be kicked off at the user-specified intervals.
<ul>
<li>If the previous micro-batch completes within the interval, then the engine will wait until the interval is over before kicking off the next micro-batch.</li>
<li>If the previous micro-batch takes longer than the interval to complete (i.e. if an interval boundary is missed), then the next micro-batch will start as soon as the previous one completes (i.e., it will not wait for the next interval boundary).</li>
<li>If no new data is available, then no micro-batch will be kicked off.</li>
</ul>
</td>
</tr>
<tr>
<td><strong>One-time micro-batch</strong></td>
<td>The query will execute *only one* micro-batch to process all the available data and then stop on its own. This is useful in scenarios you want to periodically spin up a cluster, process everything that is available since the last period, and then shutdown the cluster. In some case, this may lead to significant cost savings.</td>
</tr>
<tr>
<td><strong>Continuous with fixed checkpoint interval</strong><br /><em>(experimental)</em></td>
<td>The query will be executed in the new low-latency, continuous processing mode. Read more about this in the <a href="http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#continuous-processing">Continuous Processing section</a> below.</td>
</tr>
</tbody>
</table>
下面是一些代码示例。
- Scala
```scala
import org.apache.spark.sql.streaming.Trigger
// Default trigger (runs micro-batch as soon as it can)
df.writeStream
.format("console")
.start()
// ProcessingTime trigger with two-seconds micro-batch interval
df.writeStream
.format("console")
.trigger(Trigger.ProcessingTime("2 seconds"))
.start()
// One-time trigger
df.writeStream
.format("console")
.trigger(Trigger.Once())
.start()
// Continuous trigger with one-second checkpointing interval
df.writeStream
.format("console")
.trigger(Trigger.Continuous("1 second"))
.start()
```
- Java
```java
import org.apache.spark.sql.streaming.Trigger
// Default trigger (runs micro-batch as soon as it can)
df.writeStream
.format("console")
.start();
// ProcessingTime trigger with two-seconds micro-batch interval
df.writeStream
.format("console")
.trigger(Trigger.ProcessingTime("2 seconds"))
.start();
// One-time trigger
df.writeStream
.format("console")
.trigger(Trigger.Once())
.start();
// Continuous trigger with one-second checkpointing interval
df.writeStream
.format("console")
.trigger(Trigger.Continuous("1 second"))
.start();
```
## `Stream` 管理查询
启动查询时创建的 `StreamingQuery` 对象可用于监视和管理查询。
- Scala
```scala
val query = df.writeStream.format("console").start() // get the query object
query.id // get the unique identifier of the running query that persists across restarts from checkpoint data
query.runId // get the unique id of this run of the query, which will be generated at every start/restart
query.name // get the name of the auto-generated or user-specified name
query.explain() // print detailed explanations of the query
query.stop() // stop the query
query.awaitTermination() // block until query is terminated, with stop() or with error
query.exception // the exception if the query has been terminated with error
query.recentProgress // an array of the most recent progress updates for this query
query.lastProgress // the most recent progress update of this streaming query
```
- Java
```java
StreamingQuery query = df.writeStream().format("console").start(); // get the query object
query.id(); // get the unique identifier of the running query that persists across restarts from checkpoint data
query.runId(); // get the unique id of this run of the query, which will be generated at every start/restart
query.name(); // get the name of the auto-generated or user-specified name
query.explain(); // print detailed explanations of the query
query.stop(); // stop the query
query.awaitTermination(); // block until query is terminated, with stop() or with error
query.exception(); // the exception if the query has been terminated with error
query.recentProgress(); // an array of the most recent progress updates for this query
query.lastProgress(); // the most recent progress update of this streaming query