提交 0f4e5cb1 编写于 作者: I Igal Levy

added firehose and plumber sections, which were being referenced but were missing

上级 2245764b
---
layout: doc_page
---
Realtime Data Ingestion
=======================
For general Real-time Node information, see [here](Realtime.html).
......@@ -11,6 +13,7 @@ For writing your own plugins to the real-time node, see [Firehose](Firehose.html
Much of the configuration governing Realtime nodes and the ingestion of data is set in the Realtime spec file, discussed on this page.
<a id="realtime-specfile"></a>
## Realtime "specFile"
......@@ -81,6 +84,7 @@ This is a JSON Array so you can give more than one realtime stream to a given no
There are four parts to a realtime stream specification, `schema`, `config`, `firehose` and `plumber` which we will go into here.
### Schema
This describes the data schema for the output Druid segment. More information about concepts in Druid and querying can be found at [Concepts-and-Terminology](Concepts-and-Terminology.html) and [Querying](Querying.html).
......@@ -92,6 +96,7 @@ This describes the data schema for the output Druid segment. More information ab
|indexGranularity|String|The granularity of the data inside the segment. E.g. a value of "minute" will mean that data is aggregated at minutely granularity. That is, if there are collisions in the tuple (minute(timestamp), dimensions), then it will aggregate values together using the aggregators instead of storing individual rows.|yes|
|shardSpec|Object|This describes the shard that is represented by this server. This must be specified properly in order to have multiple realtime nodes indexing the same data stream in a sharded fashion.|no|
### Config
This provides configuration for the data processing portion of the realtime stream processor.
......@@ -101,6 +106,22 @@ This provides configuration for the data processing portion of the realtime stre
|intermediatePersistPeriod|ISO8601 Period String|The period that determines the rate at which intermediate persists occur. These persists determine how often commits happen against the incoming realtime stream. If the realtime data loading process is interrupted at time T, it should be restarted to re-read data that arrived at T minus this period.|yes|
|maxRowsInMemory|Number|The number of rows to aggregate before persisting. This number is the post-aggregation rows, so it is not equivalent to the number of input events, but the number of aggregated rows that those events result in. This is used to manage the required JVM heap size.|yes|
### Firehose
Firehoses describe the data stream source. See [Firehose](Firehose.html) for more information on firehose configuration.
### Plumber
The Plumber handles generated segments both while they are being generated and when they are "done". The configuration parameters in the example are:
* `type` specifies the type of plumber in terms of configuration schema. The Plumber configuration in the example is for the often-used RealtimePlumber.
* `windowPeriod` is the amount of lag time to allow events. The example configures a 10 minute window, meaning that any event more than 10 minutes ago will be thrown away and not included in the segment generated by the realtime server.
* `segmentGranularity` specifies the granularity of the segment, or the amount of time a segment will represent.
* `basePersistDirectory` is the directory to put things that need persistence. The plumber is responsible for the actual intermediate persists and this tells it where to store those persists.
See [Plumber](Plumber.html) for a fuller discussion of Plumber configuration.
Constraints
-----------
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册