提交 046e8864 编写于 作者: F fjy

more docs

上级 2667cd51
......@@ -4,7 +4,7 @@ layout: doc_page
Data Formats for Ingestion
==========================
Druid can ingest data in JSON, CSV, or TSV. While most examples in the documentation use data in JSON format, it is not difficult to configure Druid to ingest CSV or TSV data.
Druid can ingest data in JSON, CSV, or custom delimited data such as TSV. While most examples in the documentation use data in JSON format, it is not difficult to configure Druid to ingest CSV or other delimited data.
## Formatting the Data
The following are three samples of the data used in the [Wikipedia example](Tutorial:-Loading-Your-Data-Part-1.html).
......@@ -41,8 +41,8 @@ _TSV_
Note that the CSV and TSV data do not contain column heads. This becomes important when you specify the data for ingesting.
## Configuring Ingestion For the Indexing Service
If you use the [indexing service](Indexing-Service.html) for ingesting the data, a [task](Tasks.html) must be configured and submitted. Tasks are configured with a JSON object which, among other things, specifies the data source and type. In the Wikipedia example, JSON data was read from a local file. The task spec contains a firehose element to specify this:
## Configuration
All forms of Druid ingestion require some form of schema object. An example blob of json pertaining to the data format may look something like this:
```json
"firehose" : {
......
---
layout: doc_page
---
Ingesting from Kafka 8
----------------------
The previous examples are for Kafka 7. To support Kafka 8, a couple changes need to be made:
- Update realtime node's configs for Kafka 8 extensions
- e.g.
- `druid.extensions.coordinates=[...,"io.druid.extensions:druid-kafka-seven:0.6.136",...]`
- becomes
- `druid.extensions.coordinates=[...,"io.druid.extensions:druid-kafka-eight:0.6.136",...]`
- Update realtime task config for changed keys
- `firehose.type`, `plumber.rejectionPolicyFactory`, and all of `firehose.consumerProps` changes.
```json
"firehose" : {
"type" : "kafka-0.8",
"consumerProps" : {
"zookeeper.connect": "localhost:2181",
"zookeeper.connection.timeout.ms": "15000",
"zookeeper.session.timeout.ms": "15000",
"zookeeper.sync.time.ms": "5000",
"group.id": "topic-pixel-local",
"fetch.message.max.bytes": "1048586",
"auto.offset.reset": "largest",
"auto.commit.enable": "false"
},
"feed" : "druidtest",
"parser" : {
"timestampSpec" : {
"column" : "utcdt",
"format" : "iso"
},
"data" : {
"format" : "json"
},
"dimensionExclusions" : [
"wp"
]
}
},
"plumber" : {
"type" : "realtime",
"windowPeriod" : "PT10m",
"segmentGranularity":"hour",
"basePersistDirectory" : "/tmp/realtime/basePersist",
"rejectionPolicyFactory": {
"type": "messageTime"
}
}
```
\ No newline at end of file
---
layout: doc_page
---
Working with different versions of Hadoop may require a bit of extra work for the time being. We will make changes to support different Hadoop versions in the near future. If you have problems outside of these instructions, please feel free to contact us in IRC or on the [forum](https://groups.google.com/forum/#!forum/druid-development).
Working with Hadoop 2.x
-----------------------
The default version of Hadoop bundled with Druid is 2.3. This should work out of the box.
To override the default Hadoop version, both the Hadoop Index Task and the standalone Hadoop indexer support the parameter `hadoopDependencyCoordinates`. You can pass another set of Hadoop coordinates through this parameter (e.g. You can specify coordinates for Hadoop 2.4.0 as `["org.apache.hadoop:hadoop-client:2.4.0"]`).
The Hadoop Index Task takes this parameter has part of the task JSON and the standalone Hadoop indexer takes this parameter as a command line argument.
Working with Hadoop 1.x and older
---------------------------------
We recommend recompiling Druid with your particular version of Hadoop by changing the dependencies in Druid's pom.xml files. Make sure to also either override the default `hadoopDependencyCoordinates` in the code or pass your Hadoop version in as part of indexing.
\ No newline at end of file
......@@ -341,60 +341,3 @@ Additional Information
----------------------
Getting data into Druid can definitely be difficult for first time users. Please don't hesitate to ask questions in our IRC channel or on our [google groups page](https://groups.google.com/forum/#!forum/druid-development).
Further Reading
---------------------
Ingesting from Kafka 8
---------------------------------
Continuing from the Kafka 7 examples, to support Kafka 8, a couple changes need to be made:
- Update realtime node's configs for Kafka 8 extensions
- e.g.
- `druid.extensions.coordinates=[...,"io.druid.extensions:druid-kafka-seven:0.6.136",...]`
- becomes
- `druid.extensions.coordinates=[...,"io.druid.extensions:druid-kafka-eight:0.6.136",...]`
- Update realtime task config for changed keys
- `firehose.type`, `plumber.rejectionPolicyFactory`, and all of `firehose.consumerProps` changes.
```json
"firehose" : {
"type" : "kafka-0.8",
"consumerProps" : {
"zookeeper.connect": "localhost:2181",
"zookeeper.connection.timeout.ms": "15000",
"zookeeper.session.timeout.ms": "15000",
"zookeeper.sync.time.ms": "5000",
"group.id": "topic-pixel-local",
"fetch.message.max.bytes": "1048586",
"auto.offset.reset": "largest",
"auto.commit.enable": "false"
},
"feed" : "druidtest",
"parser" : {
"timestampSpec" : {
"column" : "utcdt",
"format" : "iso"
},
"data" : {
"format" : "json"
},
"dimensionExclusions" : [
"wp"
]
}
},
"plumber" : {
"type" : "realtime",
"windowPeriod" : "PT10m",
"segmentGranularity":"hour",
"basePersistDirectory" : "/tmp/realtime/basePersist",
"rejectionPolicyFactory": {
"type": "messageTime"
}
}
```
......@@ -26,11 +26,14 @@ h2. Configuration
* "Historical":Historical-Config.html
* "Broker":Broker-Config.html
* "Realtime":Realtime-Config.html
* "Configuring Logging":./Logging.html
h2. Data Ingestion
* "Ingestion FAQ":./Ingestion-FAQ.html
* "Realtime":./Realtime-ingestion.html
** "Kafka-0.8.x Ingestion":./Kafka-Eight.html
* "Batch":./Batch-ingestion.html
** "Different Hadoop Versions":./Other-Hadoop.html
* "Indexing Service":./Indexing-Service.html
** "Tasks":./Tasks.html
* "Data Formats":./Data_formats.html
......@@ -39,8 +42,6 @@ h2. Operations
* "Performance FAQ":./Performance-FAQ.html
* "Extending Druid":./Modules.html
* "Booting a Production Cluster":./Booting-a-production-cluster.html
* "Performance FAQ":./Performance-FAQ.html
* "Logging":./Logging.html
h2. Querying
* "Querying":./Querying.html
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册