提交 9796a40b 编写于 作者: F fjy

port docs over to 0.6 and a bunch of misc fixes

上级 b85fbbfd
......@@ -26,7 +26,7 @@
<description>druid-cassandra-storage</description>
<parent>
<groupId>com.metamx</groupId>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.0-SNAPSHOT</version>
</parent>
......
......@@ -26,7 +26,7 @@
<description>druid-common</description>
<parent>
<groupId>com.metamx</groupId>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.0-SNAPSHOT</version>
</parent>
......
---
layout: doc_page
---
Batch Data Ingestion
====================
There are two choices for batch data ingestion to your Druid cluster, you can use the [Indexing service](Indexing-service.html) or you can use the `HadoopDruidIndexer`. This page describes how to use the `HadoopDruidIndexer`.
There are two choices for batch data ingestion to your Druid cluster, you can use the [Indexing service](Indexing-service.html) or you can use the `HadoopDruidIndexer`.
Which should I use?
-------------------
The [Indexing service](Indexing-service.html) is a node that can run as part of your Druid cluster and can accomplish a number of different types of indexing tasks. Even if all you care about is batch indexing, it provides for the encapsulation of things like the Database that is used for segment metadata and other things, so that your indexing tasks do not need to include such information. Long-term, the indexing service is going to be the preferred method of ingesting data.
The [Indexing service](Indexing-service.html) is a node that can run as part of your Druid cluster and can accomplish a number of different types of indexing tasks. Even if all you care about is batch indexing, it provides for the encapsulation of things like the [database](MySQL.html) that is used for segment metadata and other things, so that your indexing tasks do not need to include such information. Long-term, the indexing service is going to be the preferred method of ingesting data.
The `HadoopDruidIndexer` runs hadoop jobs in order to separate and index data segments. It takes advantage of Hadoop as a job scheduling and distributed job execution platform. It is a simple method if you already have Hadoop running and don’t want to spend the time configuring and deploying the [Indexing service](Indexing service.html) just yet.
HadoopDruidIndexer
------------------
Batch Ingestion using the HadoopDruidIndexer
--------------------------------------------
The HadoopDruidIndexer can be run like so:
```
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath hadoop_config_path:`echo lib/* | tr ' ' ':'` io.druid.cli.Main index hadoop <config_file>
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:<hadoop_config_path> io.druid.cli.Main index hadoop <config_file>
```
The interval is the [ISO8601 interval](http://en.wikipedia.org/wiki/ISO_8601#Time_intervals) of the data you are processing. The config\_file is a path to a file (the "specFile") that contains JSON and an example looks like:
```
```json
{
"dataSource": "the_data_source",
"timestampColumn": "ts",
"timestampFormat": "<iso, millis, posix, auto or any Joda time format>",
"dataSpec": {
"format": "<csv, tsv, or json>",
"columns": ["ts", "column_1", "column_2", "column_3", "column_4", "column_5"],
"dimensions": ["column_1", "column_2", "column_3"]
"columns": [
"ts",
"column_1",
"column_2",
"column_3",
"column_4",
"column_5"
],
"dimensions": [
"column_1",
"column_2",
"column_3"
]
},
"granularitySpec": {
"type":"uniform",
"intervals":["<ISO8601 interval:http://en.wikipedia.org/wiki/ISO_8601#Time_intervals>"],
"gran":"day"
"type": "uniform",
"intervals": [
"<ISO8601 interval:http:\/\/en.wikipedia.org\/wiki\/ISO_8601#Time_intervals>"
],
"gran": "day"
},
"pathSpec": {
"type": "granularity",
"dataGranularity": "hour",
"inputPath": "s3n:\/\/billy-bucket\/the\/data\/is\/here",
"filePattern": ".*"
},
"pathSpec": { "type": "granularity",
"dataGranularity": "hour",
"inputPath": "s3n://billy-bucket/the/data/is/here",
"filePattern": ".*" },
"rollupSpec": { "aggs": [
{ "type": "count", "name":"event_count" },
{ "type": "doubleSum", "fieldName": "column_4", "name": "revenue" },
{ "type": "longSum", "fieldName" : "column_5", "name": "clicks" }
],
"rollupGranularity": "minute"},
"workingPath": "/tmp/path/on/hdfs",
"segmentOutputPath": "s3n://billy-bucket/the/segments/go/here",
"rollupSpec": {
"aggs": [
{
"type": "count",
"name": "event_count"
},
{
"type": "doubleSum",
"fieldName": "column_4",
"name": "revenue"
},
{
"type": "longSum",
"fieldName": "column_5",
"name": "clicks"
}
],
"rollupGranularity": "minute"
},
"workingPath": "\/tmp\/path\/on\/hdfs",
"segmentOutputPath": "s3n:\/\/billy-bucket\/the\/segments\/go\/here",
"leaveIntermediate": "false",
"partitionsSpec": {
"targetPartitionSize": 5000000
},
"updaterJobSpec": {
"type":"db",
"connectURI":"jdbc:mysql://localhost:7980/test_db",
"user":"username",
"password":"passmeup",
"segmentTable":"segments"
"type": "db",
"connectURI": "jdbc:mysql:\/\/localhost:7980\/test_db",
"user": "username",
"password": "passmeup",
"segmentTable": "segments"
}
}
```
......@@ -82,7 +107,6 @@ The interval is the [ISO8601 interval](http://en.wikipedia.org/wiki/ISO_8601#Tim
|leaveIntermediate|leave behind files in the workingPath when job completes or fails (debugging tool)|no|
|partitionsSpec|a specification of how to partition each time bucket into segments, absence of this property means no partitioning will occur|no|
|updaterJobSpec|a specification of how to update the metadata for the druid cluster these segments belong to|yes|
|registererers|a list of serde handler classnames|no|
### Path specification
......@@ -141,3 +165,67 @@ This is a specification of the properties that tell the job how to update metada
|segmentTable|table to use in DB|yes|
These properties should parrot what you have configured for your [Coordinator](Coordinator.html).
Batch Ingestion Using the Indexing Service
------------------------------------------
Batch ingestion for the indexing service is done by submitting a [Hadoop Index Task](Tasks.html). The indexing service can be started by issuing:
```
java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/overlord io.druid.cli.Main server overlord
```
This will start up a very simple local indexing service. For more complex deployments of the indexing service, see [here](Indexing-Service.html).
The schema of the Hadoop Index Task is very similar to the schema for the Hadoop Index Config. A sample Hadoop index task is shown below:
```json
{
"type" : "index_hadoop",
"config": {
"dataSource" : "example",
"timestampColumn" : "timestamp",
"timestampFormat" : "auto",
"dataSpec" : {
"format" : "json",
"dimensions" : ["dim1","dim2","dim3"]
},
"granularitySpec" : {
"type" : "uniform",
"gran" : "DAY",
"intervals" : [ "2013-08-31/2013-09-01" ]
},
"pathSpec" : {
"type" : "static",
"paths" : "data.json"
},
"targetPartitionSize" : 5000000,
"rollupSpec" : {
"aggs": [{
"type" : "count",
"name" : "count"
}, {
"type" : "doubleSum",
"name" : "added",
"fieldName" : "added"
}, {
"type" : "doubleSum",
"name" : "deleted",
"fieldName" : "deleted"
}, {
"type" : "doubleSum",
"name" : "delta",
"fieldName" : "delta"
}],
"rollupGranularity" : "none"
}
}
```
|property|description|required?|
|--------|-----------|---------|
|type|"index_hadoop"|yes|
|config|a Hadoop Index Config|yes|
The Hadoop Index Config submitted as part of an Hadoop Index Task is identical to the Hadoop Index Config used by the `HadoopBatchIndexer` except that three fields must be omitted: `segmentOutputPath`, `workingPath`, `updaterJobSpec`. The Indexing Service takes care of setting these fields internally.
......@@ -6,25 +6,73 @@ Broker
The Broker is the node to route queries to if you want to run a distributed cluster. It understands the metadata published to ZooKeeper about what segments exist on what nodes and routes queries such that they hit the right nodes. This node also merges the result sets from all of the individual nodes together.
Forwarding Queries
------------------
Quick Start
-----------
Run:
Most druid queries contain an interval object that indicates a span of time for which data is requested. Likewise, Druid [Segments](Segments.html) are partitioned to contain data for some interval of time and segments are distributed across a cluster. Consider a simple datasource with 7 segments where each segment contains data for a given day of the week. Any query issued to the datasource for more than one day of data will hit more than one segment. These segments will likely be distributed across multiple nodes, and hence, the query will likely hit multiple nodes.
```
io.druid.cli.Main server broker
```
To determine which nodes to forward queries to, the Broker node first builds a view of the world from information in Zookeeper. Zookeeper maintains information about [Historical](Historical.html) and [Realtime](Realtime.html) nodes and the segments they are serving. For every datasource in Zookeeper, the Broker node builds a timeline of segments and the nodes that serve them. When queries are received for a specific datasource and interval, the Broker node performs a lookup into the timeline associated with the query datasource for the query interval and retrieves the nodes that contain data for the query. The Broker node then forwards down the query to the selected nodes.
With the following JVM configuration:
Caching
-------
```
-server
-Xmx256m
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
druid.host=localhost
druid.service=broker
druid.port=8080
druid.zk.service.host=localhost
```
JVM Configuration
-----------------
The broker module uses several of the default modules in [Configuration](Configuration.html) and has the following set of configurations as well:
|Property|Description|Default|
|--------|-----------|-------|
|`druid.broker.cache.type`|Choices: local, memcache. The type of cache to use for queries.|local|
#### Local Cache
Broker nodes employ a distributed cache with a LRU cache invalidation strategy. The broker cache stores per segment results. The cache can be local to each broker node or shared across multiple nodes using an external distributed cache such as [memcached](http://memcached.org/). Each time a broker node receives a query, it first maps the query to a set of segments. A subset of these segment results may already exist in the cache and the results can be directly pulled from the cache. For any segment results that do not exist in the cache, the broker node will forward the query to the
historical nodes. Once the historical nodes return their results, the broker will store those results in the cache. Real-time segments are never cached and hence requests for real-time data will always be forwarded to real-time nodes. Real-time data is perpetually changing and caching the results would be unreliable.
|Property|Description|Default|
|--------|-----------|-------|
|`druid.broker.cache.sizeInBytes`|Maximum size of the cache. If this is zero, cache is disabled.|0|
|`druid.broker.cache.initialSize`|The initial size of the cache in bytes.|500000|
|`druid.broker.cache.logEvictionCount`|If this is non-zero, there will be an eviction of entries.|0|
#### Memcache
|Property|Description|Default|
|--------|-----------|-------|
|`druid.broker.cache.expiration`|Memcache [expiration time ](https://code.google.com/p/memcached/wiki/NewCommands#Standard_Protocol).|2592000 (30 days)|
|`druid.broker.cache.timeout`|Maximum time in milliseconds to wait for a response from Memcache.|500|
|`druid.broker.cache.hosts`|Memcache hosts.|none|
|`druid.broker.cache.maxObjectSize`|Maximum object size in bytes for a Memcache object.|52428800 (50 MB)|
|`druid.broker.cache.memcachedPrefix`|Key prefix for all keys in Memcache.|druid|
Running
-------
Broker nodes can be run using the `com.metamx.druid.http.BrokerMain` class.
```
io.druid.cli.Main server broker
```
Forwarding Queries
------------------
Most druid queries contain an interval object that indicates a span of time for which data is requested. Likewise, Druid [Segments](Segments.html) are partitioned to contain data for some interval of time and segments are distributed across a cluster. Consider a simple datasource with 7 segments where each segment contains data for a given day of the week. Any query issued to the datasource for more than one day of data will hit more than one segment. These segments will likely be distributed across multiple nodes, and hence, the query will likely hit multiple nodes.
To determine which nodes to forward queries to, the Broker node first builds a view of the world from information in Zookeeper. Zookeeper maintains information about [Historical](Historical.html) and [Realtime](Realtime.html) nodes and the segments they are serving. For every datasource in Zookeeper, the Broker node builds a timeline of segments and the nodes that serve them. When queries are received for a specific datasource and interval, the Broker node performs a lookup into the timeline associated with the query datasource for the query interval and retrieves the nodes that contain data for the query. The Broker node then forwards down the query to the selected nodes.
Configuration
-------------
Caching
-------
See [Configuration](Configuration.html).
Broker nodes employ a cache with a LRU cache invalidation strategy. The broker cache stores per segment results. The cache can be local to each broker node or shared across multiple nodes using an external distributed cache such as [memcached](http://memcached.org/). Each time a broker node receives a query, it first maps the query to a set of segments. A subset of these segment results may already exist in the cache and the results can be directly pulled from the cache. For any segment results that do not exist in the cache, the broker node will forward the query to the
historical nodes. Once the historical nodes return their results, the broker will store those results in the cache. Real-time segments are never cached and hence requests for real-time data will always be forwarded to real-time nodes. Real-time data is perpetually changing and caching the results would be unreliable.
\ No newline at end of file
---
layout: doc_page
---
A Druid cluster consists of various node types that need to be set up depending on your use case. See our [[Design]] docs for a description of the different node types.
A Druid cluster consists of various node types that need to be set up depending on your use case. See our [Design](Design.html) docs for a description of the different node types.
h2. Setup Scripts
Setup Scripts
-------------
One of our community members, "housejester":https://github.com/housejester/, contributed some scripts to help with setting up a cluster. Checkout the "github":https://github.com/housejester/druid-test-harness and "wiki":https://github.com/housejester/druid-test-harness/wiki/Druid-Test-Harness.
One of our community members, [housejester](https://github.com/housejester/), contributed some scripts to help with setting up a cluster. Checkout the [github](https://github.com/housejester/druid-test-harness) and [wiki](https://github.com/housejester/druid-test-harness/wiki/Druid-Test-Harness).
h2. Minimum Physical Layout: Absolute Minimum
Minimum Physical Layout: Absolute Minimum
-----------------------------------------
As a special case, the absolute minimum setup is one of the standalone examples for realtime ingestion and querying; see [[Examples]] that can easily run on one machine with one core and 1GB RAM. This layout can be set up to try some basic queries with Druid.
As a special case, the absolute minimum setup is one of the standalone examples for real-time ingestion and querying; see [Examples](Examples.html) that can easily run on one machine with one core and 1GB RAM. This layout can be set up to try some basic queries with Druid.
h2. Minimum Physical Layout: Experimental Testing with 4GB of RAM
Minimum Physical Layout: Experimental Testing with 4GB of RAM
-------------------------------------------------------------
This layout can be used to load some data from deep storage onto a Druid compute node for the first time. A minimal physical layout for a 1 or 2 core machine with 4GB of RAM is:
# node1: [[Master]] + metadata service + zookeeper + [[Compute]]
# transient nodes: indexer
1. node1: [Coordinator](Coordinator.html) + metadata service + zookeeper + [Historical](Historical.html)
2. transient nodes: [indexer](Batch-ingestion.html)
This setup is only reasonable to prove that a configuration works. It would not be worthwhile to use this layout for performance measurement.
This setup is only reasonable to prove that a configuration works. It would not be worthwhile to use this layout for performance measurement.
h2. Comfortable Physical Layout: Pilot Project with Multiple Machines
Comfortable Physical Layout: Pilot Project with Multiple Machines
-----------------------------------------------------------------
_The machine size "flavors" are using AWS/EC2 terminology for descriptive purposes only and is not meant to imply that AWS/EC2 is required or recommended. Another cloud provider or your own hardware can also work._
The machine size "flavors" are using AWS/EC2 terminology for descriptive purposes only and is not meant to imply that AWS/EC2 is required or recommended. Another cloud provider or your own hardware can also work.
A minimal physical layout not constrained by cores that demonstrates parallel querying and realtime, using AWS-EC2 "small"/m1.small (one core, with 1.7GB of RAM) or larger, no realtime, is:
A minimal physical layout not constrained by cores that demonstrates parallel querying and realtime, using AWS-EC2 "small"/m1.small (one core, with 1.7GB of RAM) or larger, no real-time, is:
# node1: [[Master]] (m1.small)
# node2: metadata service (m1.small)
# node3: zookeeper (m1.small)
# node4: [[Broker]] (m1.small or m1.medium or m1.large)
# node5: [[Compute]] (m1.small or m1.medium or m1.large)
# node6: [[Compute]] (m1.small or m1.medium or m1.large)
# node7: [[Realtime]] (m1.small or m1.medium or m1.large)
# transient nodes: indexer
1. node1: [Coordinator](Coordinator.html) (m1.small)
2. node2: metadata service (m1.small)
3. node3: zookeeper (m1.small)
4. node4: [[Broker]] (m1.small or m1.medium or m1.large)
5. node5: [Historical](Historical.html) (m1.small or m1.medium or m1.large)
6. node6: [Historical](Historical.html) (m1.small or m1.medium or m1.large)
7. node7: [[Realtime]] (m1.small or m1.medium or m1.large)
8. transient nodes: [indexer](Batch-ingestion.html)
This layout naturally lends itself to adding more RAM and core to Compute nodes, and to adding many more Compute nodes. Depending on the actual load, the Master, metadata server, and Zookeeper might need to use larger machines.
This layout naturally lends itself to adding more RAM and core to Compute nodes, and to adding many more Compute nodes. Depending on the actual load, the Master, metadata server, and Zookeeper might need to use larger machines.
h2. High Availability Physical Layout
High Availability Physical Layout
---------------------------------
_The machine size "flavors" are using AWS/EC2 terminology for descriptive purposes only and is not meant to imply that AWS/EC2 is required or recommended. Another cloud provider or your own hardware can also work._
The machine size "flavors" are using AWS/EC2 terminology for descriptive purposes only and is not meant to imply that AWS/EC2 is required or recommended. Another cloud provider or your own hardware can also work.
An HA layout allows full rolling restarts and heavy volume:
# node1: [[Master]] (m1.small or m1.medium or m1.large)
# node2: [[Master]] (m1.small or m1.medium or m1.large) (backup)
# node3: metadata service (c1.medium or m1.large)
# node4: metadata service (c1.medium or m1.large) (backup)
# node5: zookeeper (c1.medium)
# node6: zookeeper (c1.medium)
# node7: zookeeper (c1.medium)
# node8: [[Broker]] (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
# node9: [[Broker]] (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge) (backup)
# node10: [[Compute]] (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
# node11: [[Compute]] (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
# node12: [[Realtime]] (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
# transient nodes: indexer
1. node1: [Coordinator](Coordinator.html) (m1.small or m1.medium or m1.large)
2. node2: [Coordinator](Coordinator.html) (m1.small or m1.medium or m1.large) (backup)
3. node3: metadata service (c1.medium or m1.large)
4. node4: metadata service (c1.medium or m1.large) (backup)
5. node5: zookeeper (c1.medium)
6. node6: zookeeper (c1.medium)
7. node7: zookeeper (c1.medium)
8. node8: [[Broker]] (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
9. node9: [[Broker]] (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge) (backup)
10. node10: [Historical](Historical.html) (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
11. node11: [Historical](Historical.html) (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
12. node12: [[Realtime]] (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
13. transient nodes: [indexer](Batch-ingestion.html)
h2. Sizing for Cores and RAM
Sizing for Cores and RAM
------------------------
The Compute and Broker nodes will use as many cores as are available, depending on usage, so it is best to keep these on dedicated machines. The upper limit of effectively utilized cores is not well characterized yet and would depend on types of queries, query load, and the schema. Compute daemons should have a heap a size of at least 1GB per core for normal usage, but could be squeezed into a smaller heap for testing. Since in-memory caching is essential for good performance, even more RAM is better. Broker nodes will use RAM for caching, so they do more than just route queries.
The Compute and Broker nodes will use as many cores as are available, depending on usage, so it is best to keep these on dedicated machines. The upper limit of effectively utilized cores is not well characterized yet and would depend on types of queries, query load, and the schema. Compute daemons should have a heap a size of at least 1GB per core for normal usage, but could be squeezed into a smaller heap for testing. Since in-memory caching is essential for good performance, even more RAM is better. Broker nodes will use RAM for caching, so they do more than just route queries.
The effective utilization of cores by Zookeeper, MySQL, and Master nodes is likely to be between 1 and 2 for each process/daemon, so these could potentially share a machine with lots of cores. These daemons work with heap a size between 500MB and 1GB.
The effective utilization of cores by Zookeeper, MySQL, and Master nodes is likely to be between 1 and 2 for each process/daemon, so these could potentially share a machine with lots of cores. These daemons work with heap a size between 500MB and 1GB.
h2. Storage
Storage
-------
Indexed segments should be kept in a permanent store accessible by all nodes like AWS S3 or HDFS or equivalent. Currently Druid supports S3, but this will be extended soon.
Indexed segments should be kept in a permanent store accessible by all nodes like AWS S3 or HDFS or equivalent. Currently Druid supports S3, but this will be extended soon.
Local disk ("ephemeral" on AWS EC2) for caching is recommended over network mounted storage (example of mounted: AWS EBS, Elastic Block Store) in order to avoid network delays during times of heavy usage. If your data center is suitably provisioned for networked storage, perhaps with separate LAN/NICs just for storage, then mounted might work fine.
Local disk ("ephemeral" on AWS EC2) for caching is recommended over network mounted storage (example of mounted: AWS EBS, Elastic Block Store) in order to avoid network delays during times of heavy usage. If your data center is suitably provisioned for networked storage, perhaps with separate LAN/NICs just for storage, then mounted might work fine.
h2. Setup
Setup
-----
Setting up a cluster is essentially just firing up all of the nodes you want with the proper [[configuration]]. One thing to be aware of is that there are a few properties in the configuration that potentially need to be set individually for each process:
Setting up a cluster is essentially just firing up all of the nodes you want with the proper [[configuration]]. One thing to be aware of is that there are a few properties in the configuration that potentially need to be set individually for each process:
<pre>
<code>
```
druid.server.type=historical|realtime
druid.host=someHostOrIPaddrWithPort
druid.port=8080
</code>
</pre>
```
@druid.server.type@ should be set to "historical" for your compute nodes and realtime for the realtime nodes. The master will only assign segments to a "historical" node and the broker has some intelligence around its ability to cache results when talking to a realtime node. This does not need to be set for the master or the broker.
`druid.server.type` should be set to "historical" for your compute nodes and realtime for the realtime nodes. The master will only assign segments to a "historical" node and the broker has some intelligence around its ability to cache results when talking to a realtime node. This does not need to be set for the master or the broker.
@druid.host@ should be set to the hostname and port that can be used to talk to the given server process. Basically, someone should be able to send a request to http://${druid.host}/ and actually talk to the process.
`druid.host` should be set to the hostname and port that can be used to talk to the given server process. Basically, someone should be able to send a request to http://${druid.host}/ and actually talk to the process.
@druid.port@ should be set to the port that the server should listen on. In the vast majority of cases, this port should be the same as what is on @druid.host@.
`druid.port` should be set to the port that the server should listen on. In the vast majority of cases, this port should be the same as what is on `druid.host`.
h2. Build/Run
Build/Run
---------
The simplest way to build and run from the repository is to run @mvn package@ from the base directory and then take @druid-services/target/druid-services-*-selfcontained.jar@ and push that around to your machines; the jar does not need to be expanded, and since it contains the main() methods for each kind of service, it is *not* invoked with java -jar. It can be run from a normal java command-line by just including it on the classpath and then giving it the main class that you want to run. For example one instance of the Compute node/service can be started like this:
The simplest way to build and run from the repository is to run `mvn package` from the base directory and then take `druid-services/target/druid-services-*-selfcontained.jar` and push that around to your machines; the jar does not need to be expanded, and since it contains the main() methods for each kind of service, it is *not* invoked with java -jar. It can be run from a normal java command-line by just including it on the classpath and then giving it the main class that you want to run. For example one instance of the Historical node/service can be started like this:
<pre>
<code>
java -Duser.timezone=UTC -Dfile.encoding=UTF-8 -cp compute/:druid-services/target/druid-services-*-selfcontained.jar com.metamx.druid.http.ComputeMain
</code>
</pre>
The following table shows the possible services and fully qualified class for main().
```
java -Duser.timezone=UTC -Dfile.encoding=UTF-8 -cp services/target/druid-services-*-selfcontained.jar io.druid.cli.Main server historical
```
|_. service |_. main class |
| [[ Realtime ]] | com.metamx.druid.realtime.RealtimeMain |
| [[ Master ]] | com.metamx.druid.http.MasterMain |
| [[ Broker ]] | com.metamx.druid.http.BrokerMain |
| [[ Compute ]] | com.metamx.druid.http.ComputeMain |
\ No newline at end of file
All Druid server nodes can be started with:
```
io.druid.cli.Main server <node_type>
```
The table below show the program arguments for the different node types.
|service|program arguments|
|-------|----------------|
|Realtime|realtime|
|Coordinator|coordinator|
|Broker|broker|
|Historical|historical|
\ No newline at end of file
此差异已折叠。
......@@ -2,7 +2,7 @@
layout: doc_page
---
Coordinator
======
===========
The Druid coordinator node is primarily responsible for segment management and distribution. More specifically, the Druid coordinator node communicates to historical nodes to load or drop segments based on configurations. The Druid coordinator is responsible for loading new segments, dropping outdated segments, managing segment replication, and balancing segment load.
......@@ -10,6 +10,96 @@ The Druid coordinator runs periodically and the time between each run is a confi
Before any unassigned segments are serviced by historical nodes, the available historical nodes for each tier are first sorted in terms of capacity, with least capacity servers having the highest priority. Unassigned segments are always assigned to the nodes with least capacity to maintain a level of balance between nodes. The coordinator does not directly communicate with a historical node when assigning it a new segment; instead the coordinator creates some temporary information about the new segment under load queue path of the historical node. Once this request is seen, the historical node will load the segment and begin servicing it.
Quick Start
-----------
Run:
```
io.druid.cli.Main server coordinator
```
With the following JVM configuration:
```
-server
-Xmx256m
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
druid.host=localhost
druid.service=coordinator
druid.port=8082
druid.zk.service.host=localhost
druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.db.connector.user=druid
druid.db.connector.password=diurd
druid.coordinator.startDelay=PT60s
```
JVM Configuration
-----------------
The coordinator module uses several of the default modules in [Configuration](Configuration.html) and has the following set of configurations as well:
|Property|Description|Default|
|--------|-----------|-------|
|`druid.coordinator.period`|The run period for the coordinator. The coordinator’s operates by maintaining the current state of the world in memory and periodically looking at the set of segments available and segments being served to make decisions about whether any changes need to be made to the data topology. This property sets the delay between each of these runs.|PT60S|
|`druid.coordinator.period.indexingPeriod`|How often to send indexing tasks to the indexing service. Only applies if merge or conversion is turned on.|PT1800S (30 mins)|
|`druid.coordinator.removedSegmentLifetime`|When a node disappears, the coordinator can provide a grace period for how long it waits before deciding that the node really isn’t going to come back and it really should declare that all segments from that node are no longer available. This sets that grace period in number of runs of the coordinator.|1|
|`druid.coordinator.startDelay`|The operation of the Coordinator works on the assumption that it has an up-to-date view of the state of the world when it runs, the current ZK interaction code, however, is written in a way that doesn’t allow the Coordinator to know for a fact that it’s done loading the current state of the world. This delay is a hack to give it enough time to believe that it has all the data.|PT300S|
|`druid.coordinator.merge.on`|Boolean flag for whether or not the coordinator should try and merge small segments into a more optimal segment size.|PT300S|
|`druid.coordinator.conversion.on`|Boolean flag for converting old segment indexing versions to the latest segment indexing version.|false|
|`druid.coordinator.load.timeout`|The timeout duration for when the coordinator assigns a segment to a historical node.|15 minutes|
|`druid.manager.segment.pollDuration`|The duration between polls the Coordinator does for updates to the set of active segments. Generally defines the amount of lag time it can take for the coordinator to notice new segments.|PT1M|
|`druid.manager.rules.pollDuration`|The duration between polls the Coordinator does for updates to the set of active rules. Generally defines the amount of lag time it can take for the coordinator to notice rules.|PT1M|
|`druid.manager.rules.defaultTier`|The default tier from which default rules will be loaded from.|_default|
Dynamic Configuration
---------------------
The coordinator has dynamic configuration to change certain behaviour on the fly. The coordinator a JSON spec object from the Druid [MySQL](MySQL.html) config table. This object is detailed below:
It is recommended that you use the Coordinator Console to configure these parameters. However, if you need to do it via HTTP, the JSON object can be submitted to the overlord via a POST request at:
```
http://<COORDINATOR_IP>:<PORT>/coordinator/config
```
A sample worker setup spec is shown below:
```json
{
"millisToWaitBeforeDeleting": 900000,
"mergeBytesLimit": 100000000L,
"mergeSegmentsLimit" : 1000,
"maxSegmentsToMove": 5,
"replicantLifetime": 15,
"replicationThrottleLimit": 10,
"emitBalancingStats": false
}
```
Issuing a GET request at the same URL will return the spec that is currently in place. A description of the config setup spec is shown below.
|Property|Description|Default|
|--------|-----------|-------|
|`millisToWaitBeforeDeleting`|How long does the coordinator need to be active before it can start deleting segments.|90000 (15 mins)|
|`mergeBytesLimit`|The maximum number of bytes to merge (for segments).|100000000L|
|`mergeSegmentsLimit`|The maximum number of segments that can be in a single merge [task](Tasks.html).|Integer.MAX_VALUE|
|`maxSegmentsToMove`|The maximum number of segments that can be moved at any given time.|5|
|`replicantLifetime`|The maximum number of coordinator runs for a segment to be replicated before we start alerting.|15|
|`replicationThrottleLimit`|The maximum number of segments that can be replicated at one time.|10|
|`emitBalancingStats`|Boolean flag for whether or not we should emit balancing stats. This is an expensive operation.|false|
### Running
```
io.druid.cli.Main server coordinator
```
Rules
-----
......@@ -104,7 +194,13 @@ The coordinator node exposes several HTTP endpoints for interactions.
The Coordinator Console
------------------
The Druid coordinator exposes a web GUI for displaying cluster information and rule configuration. After the coordinator starts, the console can be accessed at http://<HOST>:<PORT>. There exists a full cluster view, as well as views for individual historical nodes, datasources and segments themselves. Segment information can be displayed in raw JSON form or as part of a sortable and filterable table.
The Druid coordinator exposes a web GUI for displaying cluster information and rule configuration. After the coordinator starts, the console can be accessed at:
```
http://<COORDINATOR_IP>:<COORDINATOR_PORT>
```
There exists a full cluster view, as well as views for individual historical nodes, datasources and segments themselves. Segment information can be displayed in raw JSON form or as part of a sortable and filterable table.
The coordinator console also exposes an interface to creating and editing rules. All valid datasources configured in the segment database, along with a default datasource, are available for configuration. Rules of different types can be added, deleted or edited.
......@@ -123,14 +219,4 @@ FAQ
No. If the Druid coordinator is not started up, no new segments will be loaded in the cluster and outdated segments will not be dropped. However, the coordinator node can be started up at any time, and after a configurable delay, will start running coordinator tasks.
This also means that if you have a working cluster and all of your coordinators die, the cluster will continue to function, it just won’t experience any changes to its data topology.
Running
-------
Coordinator nodes can be run using the `io.druid.cli.Main` class with program parameters "server coordinator".
Configuration
-------------
See [Configuration](Configuration.html).
This also means that if you have a working cluster and all of your coordinators die, the cluster will continue to function, it just won’t experience any changes to its data topology.
\ No newline at end of file
......@@ -6,6 +6,49 @@ Historical
Historical nodes load up historical segments and expose them for querying.
Quick Start
-----------
Run:
```
io.druid.cli.Main server historical
```
With the following JVM configuration:
```
-server
-Xmx256m
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
druid.host=localhost
druid.service=historical
druid.port=8081
druid.zk.service.host=localhost
druid.server.maxSize=100000000
druid.processing.buffer.sizeBytes=10000000
druid.segmentCache.infoPath=/tmp/druid/segmentInfoCache
druid.segmentCache.locations=[{"path": "/tmp/druid/indexCache", "maxSize"\: 100000000}]```
```
Note: This will spin up a Historical node with the local filesystem as deep storage.
JVM Configuration
-----------------
The historical module uses several of the default modules in [Configuration](Configuration.html) and has no uniques configs of its own.
Running
-------
```
io.druid.cli.Main server historical
```
Loading and Serving Segments
----------------------------
......@@ -27,14 +70,4 @@ Querying Segments
Please see [Querying](Querying.html) for more information on querying historical nodes.
For every query that a historical node services, it will log the query and report metrics on the time taken to run the query.
Running
-------
p
Historical nodes can be run using the `io.druid.cli.Main` class with program arguments "server historical".
Configuration
-------------
See [Configuration](Configuration.html).
For every query that a historical node services, it will log the query and report metrics on the time taken to run the query.
\ No newline at end of file
......@@ -49,12 +49,10 @@ The truth is, the indexing service is an experience that is difficult to charact
The indexing service is philosophical transcendence, an infallible truth that will shape your soul, mold your character, and define your reality. The indexing service is creating world peace, playing with puppies, unwrapping presents on Christmas morning, cradling a loved one, and beating Goro in Mortal Kombat for the first time. The indexing service is sustainable economic growth, global propensity, and a world of transparent financial transactions. The indexing service is a true belieber. The indexing service is panicking because you forgot you signed up for a course and the big exam is in a few minutes, only to wake up and realize it was all a dream. What is the indexing service? More like what isn’t the indexing service. The indexing service is here and it is ready, but are you?
-->
<!--
Indexing Service Overview
-------------------------
We need one of these.
-->
![Indexing Service](../img/indexing_service.png "Indexing Service")
Overlord Node
-------------
......@@ -105,7 +103,7 @@ If autoscaling is enabled, new middle managers may be added when a task has been
#### JVM Configuration
The overlord module requires the following basic configs to run in remote mode:
In addition to the configuration of some of the default modules in [Configuration](Configuration.html), the overlord module requires the following basic configs to run in remote mode:
|Property|Description|Default|
|--------|-----------|-------|
......
......@@ -6,22 +6,46 @@ Realtime
Realtime nodes provide a realtime index. Data indexed via these nodes is immediately available for querying. Realtime nodes will periodically build segments representing the data they’ve collected over some span of time and hand these segments off to [Historical](Historical.html) nodes.
Running
-------
Quick Start
-----------
Run:
Realtime nodes can be run using the `com.metamx.druid.realtime.RealtimeMain` class.
```
io.druid.cli.Main server realtime
```
Segment Propagation
-------------------
With the following JVM configuration:
The segment propagation diagram for real-time data ingestion can be seen below:
```
-server
-Xmx256m
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
![Segment Propagation](https://raw.github.com/metamx/druid/druid-0.5.4/doc/segment_propagation.png "Segment Propagation")
druid.host=localhost
druid.service=realtime
druid.port=8083
druid.zk.service.host=localhost
druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.db.connector.user=druid
druid.db.connector.password=diurd
druid.processing.buffer.sizeBytes=10000000
```
Note: This setup will not hand off segments to the rest of the cluster.
Configuration
-------------
JVM Configuration
-----------------
Realtime nodes take a mix of base server configuration and spec files that describe how to connect, process and expose the realtime feed. See [Configuration](Configuration.html) for information about general server configuration.
The realtime module uses several of the default modules in [Configuration](Configuration.html) and has the following set of configurations as well:
|Property|Description|Default|
|--------|-----------|-------|
|`druid.realtime.specFile`|The file with realtime specifications in it.|none|
|`druid.publish.type`|Choices:noop, db. After a real-time node completes building a segment after the window period, what does it do with it? For true handoff to occur, this should be set to "db".|noop|
### Realtime "specFile"
......@@ -138,6 +162,20 @@ The normal, expected use cases have the following overall constraints: `indexGra
If the RealtimeNode process runs out of heap, try adjusting druid.computation.buffer.size property which specifies a size in bytes that must fit into the heap.
Running
-------
```
io.druid.cli.Main server realtime
```
Segment Propagation
-------------------
The segment propagation diagram for real-time data ingestion can be seen below:
![Segment Propagation](https://raw.github.com/metamx/druid/druid-0.5.4/doc/segment_propagation.png "Segment Propagation")
Requirements
------------
......
......@@ -74,6 +74,8 @@ The Hadoop Index Task is used to index larger data sets that require the paralle
|type|The task type, this should always be "index_hadoop".|yes|
|config|See [Batch Ingestion](Batch-ingestion.html)|yes|
The Hadoop Index Config submitted as part of an Hadoop Index Task is identical to the Hadoop Index Config used by the `HadoopBatchIndexer` except that three fields must be omitted: `segmentOutputPath`, `workingPath`, `updaterJobSpec`. The Indexing Service takes care of setting these fields internally.
#### Realtime Index Task
The indexing service can also run real-time tasks. These tasks effectively transform a middle manager into a real-time node. We introduced real-time tasks as a way to programmatically add new real-time data sources without needing to manually add nodes. The grammar for the real-time task is as follows:
......
---
layout: doc_page
---
In our last [tutorial](Tutorial%3A-The-Druid-Cluster.html), we setup a complete Druid cluster. We created all the Druid dependencies and loaded some batched data. Druid shards data into self-contained chunks known as [segments](Segments.html). Segments are the fundamental unit of storage in Druid and all Druid nodes only understand segments.
In our last [tutorial](Tutorial%3A-The-Druid-Cluster.html), we set up a complete Druid cluster. We created all the Druid dependencies and loaded some batched data. Druid shards data into self-contained chunks known as [segments](Segments.html). Segments are the fundamental unit of storage in Druid and all Druid nodes only understand segments.
In this tutorial, we will learn about batch ingestion (as opposed to real-time ingestion) and how to create segments using the final piece of the Druid Cluster, the [indexing service](Indexing-Service.html). The indexing service is a standalone service that accepts [tasks](Tasks.html) in the form of POST requests. The output of most tasks are segments.
If you are interested more about ingesting your own data into Druid, skip to the next [tutorial](Tutorial%3A-Loading-Your-Data-Part-2.html).
About the data
--------------
......@@ -38,7 +40,7 @@ Metrics (things to aggregate over):
Setting Up
----------
At this point, you should already have Druid downloaded and are comfortable with running a Druid cluster locally. If you are not, stop here and familiarize yourself with the first two tutorials.
At this point, you should already have Druid downloaded and are comfortable with running a Druid cluster locally. If you are not, see [here](Tutiroal%3A-The-Druid-Cluster.html).
Let's start from our usual starting point in the tarball directory.
......
......@@ -114,9 +114,6 @@ druid.port=8082
druid.zk.service.host=localhost
druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ
druid.s3.secretKey=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
druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.db.connector.user=druid
druid.db.connector.password=diurd
......@@ -150,6 +147,7 @@ druid.port=8081
druid.zk.service.host=localhost
# Dummy read only AWS account (used to download example data)
druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b
druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ
......@@ -220,6 +218,30 @@ When the segment completes downloading and ready for queries, you should see the
At this point, we can query the segment. For more information on querying, see this [link](Querying.html).
### Bonus Round: Start a Realtime Node
To start the realtime node that was used in our first tutorial, you simply have to issue:
```
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Ddruid.realtime.specFile=examples/wikipedia/wikipedia_realtime.spec -classpath lib/*:config/realtime io.druid.cli.Main server realtime
```
The configurations are located in `config/realtime/runtime.properties` and should contain the following:
```
druid.host=localhost
druid.service=realtime
druid.port=8083
druid.zk.service.host=localhost
druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.db.connector.user=druid
druid.db.connector.password=diurd
druid.processing.buffer.sizeBytes=10000000
```
Next Steps
----------
......
......@@ -6,6 +6,8 @@ Druid uses ZooKeeper (ZK) for management of current cluster state. The operation
1. [Coordinator](Coordinator.html) leader election
2. Segment "publishing" protocol from [Historical](Historical.html) and [Realtime](Realtime.html)
3. Segment load/drop protocol between [Coordinator](Coordinator.html) and [Historical](Historical.html)
4. [Overlord](Indexing-Service.html) leader election
5. [Indexing Service](Indexing-Service.html) task management
### Property Configuration
......@@ -31,6 +33,21 @@ druid.zk.paths.indexer.statusPath=${druid.zk.paths.base}/indexer/status
druid.zk.paths.indexer.leaderLatchPath=${druid.zk.paths.base}/indexer/leaderLatchPath
```
|Property|Description|Default|
|--------|-----------|-------|
|`druid.zk.paths.base`|Base Zookeeper path.|druid|
|`druid.zk.paths.propertiesPath`|Zookeeper properties path.|druid/properties|
|`druid.zk.paths.announcementsPath`|Druid node announcement path.|druid/announcements|
|`druid.zk.paths.servedSegmentsPath`|Legacy path for where Druid nodes announce their segments.|druid/servedSegments|
|`druid.zk.paths.liveSegmentsPath`|Current path for where Druid nodes announce their segments.|druid/segments|
|`druid.zk.paths.loadQueuePath`|Entries here cause historical nodes to load and drop segments.|druid/loadQueue|
|`druid.zk.paths.coordinatorPath`|Used by the coordinator for leader election.|druid/coordinator|
|`druid.zk.paths.indexer.announcementsPath`|Middle managers announce themselves here.|druid/indexer/announcements|
|`druid.zk.paths.indexer.tasksPath`|Used to assign tasks to middle managers.|druid/indexer/tasks|
|`druid.zk.paths.indexer.statusPath`|Parent path for announcement of task statuses.|druid/indexer/status|
|`druid.zk.paths.indexer.leaderLatchPath`|Used for Overlord leader election.|druid/indexer/leaderLatchPath|
NOTE: We also use Curator’s service discovery module to expose some services via zookeeper. This also uses a zookeeper path, but this path is **not** affected by `druid.zk.paths.base` and **must** be specified separately. This property is
```
......
......@@ -28,7 +28,6 @@ Data Ingestion
* [Realtime](Realtime.html)
* [Batch Ingestion](Batch-ingestion.html)
* [Indexing Service](Indexing-Service.html)
* [Indexing Service](Indexing-Service.html)
*** ]
*** [Tasks](Tasks.html)
----------------------------
......
......@@ -56,6 +56,7 @@ h2. Architecture
** "Realtime":./Realtime.html
*** "Firehose":./Firehose.html
*** "Plumber":./Plumber.html
** "Indexing Service":./Indexing-Service.html
* External Dependencies
** "Deep Storage":./Deep-Storage.html
** "MySQL":./MySQL.html
......
......@@ -4,9 +4,6 @@ druid.port=8082
druid.zk.service.host=localhost
druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ
druid.s3.secretKey=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
druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.db.connector.user=druid
druid.db.connector.password=diurd
......
......@@ -4,6 +4,7 @@ druid.port=8081
druid.zk.service.host=localhost
# Dummy read only AWS account (used to download example data)
druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b
druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ
......
......@@ -4,9 +4,6 @@ druid.port=8083
druid.zk.service.host=localhost
druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ
druid.s3.secretKey=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
druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.db.connector.user=druid
druid.db.connector.password=diurd
......
......@@ -26,7 +26,7 @@
<description>druid-examples</description>
<parent>
<groupId>com.metamx</groupId>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.0-SNAPSHOT</version>
</parent>
......
......@@ -18,7 +18,8 @@
~ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>io.druid</groupId>
<artifactId>druid-hdfs-storage</artifactId>
......@@ -26,7 +27,7 @@
<description>druid-hdfs-storage</description>
<parent>
<groupId>com.metamx</groupId>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.0-SNAPSHOT</version>
</parent>
......
......@@ -26,7 +26,7 @@
<description>Druid Indexing Hadoop</description>
<parent>
<groupId>com.metamx</groupId>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.0-SNAPSHOT</version>
</parent>
......
......@@ -26,7 +26,7 @@
<description>druid-indexing-service</description>
<parent>
<groupId>com.metamx</groupId>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.0-SNAPSHOT</version>
</parent>
......
......@@ -84,7 +84,7 @@ public class RemoteTaskActionClient implements TaskActionClient
final String response;
log.info("Submitting action for task[%s] to coordinator[%s]: %s", task.getId(), serviceUri, taskAction);
log.info("Submitting action for task[%s] to overlord[%s]: %s", task.getId(), serviceUri, taskAction);
try {
response = httpClient.post(serviceUri.toURL())
......
......@@ -20,7 +20,7 @@
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.metamx</groupId>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<packaging>pom</packaging>
<version>0.6.0-SNAPSHOT</version>
......
......@@ -26,7 +26,7 @@
<description>A module that is everything required to understands Druid Segments</description>
<parent>
<groupId>com.metamx</groupId>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.0-SNAPSHOT</version>
</parent>
......
......@@ -26,7 +26,7 @@
<description>druid-s3-extensions</description>
<parent>
<groupId>com.metamx</groupId>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.0-SNAPSHOT</version>
</parent>
......
io.druid.storage.cassandra.S3StorageDruidModule
\ No newline at end of file
io.druid.storage.s3.S3StorageDruidModule
\ No newline at end of file
......@@ -27,7 +27,7 @@
<description>Druid Server</description>
<parent>
<groupId>com.metamx</groupId>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.0-SNAPSHOT</version>
</parent>
......
......@@ -29,6 +29,7 @@ public class BalancerSegmentHolder
private final DruidServer fromServer;
private final DataSegment segment;
// This is a pretty fugly hard coding of the maximum lifetime
private volatile int lifetime = 15;
public BalancerSegmentHolder(
......
......@@ -493,7 +493,7 @@ public class DruidCoordinator
)
)
),
config.getCoordinatorSegmentMergerPeriod()
config.getCoordinatorIndexingPeriod()
)
);
}
......
......@@ -38,9 +38,9 @@ public abstract class DruidCoordinatorConfig
@Default("PT60s")
public abstract Duration getCoordinatorPeriod();
@Config("druid.coordinator.period.segmentMerger")
@Config("druid.coordinator.period.indexingPeriod")
@Default("PT1800s")
public abstract Duration getCoordinatorSegmentMergerPeriod();
public abstract Duration getCoordinatorIndexingPeriod();
@Config("druid.coordinator.merge.on")
public boolean isMergeSegments()
......
......@@ -44,7 +44,6 @@ public class EmittingRequestLoggerProvider implements RequestLoggerProvider
@Inject
public void injectMe(Injector injector)
{
System.out.println("YAYAYAYAYAYA!!!");
}
@Override
......
......@@ -89,7 +89,7 @@ public class DruidCoordinatorTest
}
@Override
public Duration getCoordinatorSegmentMergerPeriod()
public Duration getCoordinatorIndexingPeriod()
{
return null;
}
......
......@@ -17,16 +17,16 @@
~ along with this program; if not, write to the Free Software
~ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>io.druid</groupId>
<artifactId>druid-services</artifactId>
<name>druid-services</name>
<description>druid-services</description>
<version>0.6.0-SNAPSHOT</version>
<parent>
<groupId>com.metamx</groupId>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.0-SNAPSHOT</version>
</parent>
......
......@@ -46,12 +46,6 @@ public class CliInternalHadoopIndexer implements Runnable
public void run()
{
try {
System.out.println(
HadoopDruidIndexerConfig.jsonMapper.writeValueAsString(
new SingleDimensionShardSpec("billy", "a", "b", 1)
)
);
final HadoopDruidIndexerJob job = new HadoopDruidIndexerJob(getHadoopDruidIndexerConfig());
job.run();
}
......
......@@ -21,9 +21,12 @@ package io.druid.cli;
import com.fasterxml.jackson.databind.jsontype.NamedType;
import com.fasterxml.jackson.databind.module.SimpleModule;
import com.google.api.client.util.Lists;
import com.google.common.base.Joiner;
import com.google.common.base.Throwables;
import com.google.common.collect.ImmutableList;
import com.google.inject.Binder;
import com.google.inject.Inject;
import com.google.inject.Injector;
import com.google.inject.Key;
import com.google.inject.multibindings.MapBinder;
......@@ -40,6 +43,7 @@ import io.druid.guice.LifecycleModule;
import io.druid.guice.ManageLifecycle;
import io.druid.guice.NodeTypeConfig;
import io.druid.guice.PolyBind;
import io.druid.indexer.Utils;
import io.druid.indexing.common.RetryPolicyConfig;
import io.druid.indexing.common.RetryPolicyFactory;
import io.druid.indexing.common.TaskToolboxFactory;
......@@ -62,15 +66,20 @@ import io.druid.indexing.worker.executor.ChatHandlerResource;
import io.druid.indexing.worker.executor.ExecutorLifecycle;
import io.druid.indexing.worker.executor.ExecutorLifecycleConfig;
import io.druid.initialization.DruidModule;
import io.druid.initialization.Initialization;
import io.druid.query.QuerySegmentWalker;
import io.druid.segment.loading.DataSegmentKiller;
import io.druid.segment.loading.OmniDataSegmentKiller;
import io.druid.segment.loading.SegmentLoaderConfig;
import io.druid.segment.loading.StorageLocationConfig;
import io.druid.server.initialization.ExtensionsConfig;
import io.druid.server.initialization.JettyServerInitializer;
import io.tesla.aether.internal.DefaultTeslaAether;
import org.eclipse.jetty.server.Server;
import java.io.File;
import java.net.URL;
import java.net.URLClassLoader;
import java.util.Arrays;
import java.util.List;
......
......@@ -61,7 +61,7 @@ public class ConvertProperties implements Runnable
new Rename("com.metamx.emitter.logging", "druid.emitter.logging"),
new Rename("com.metamx.emitter.logging.level", "druid.emitter.logging.logLevel"),
new Rename("com.metamx.emitter.http", "druid.emitter.http"),
new Rename("com.metamx.emitter.http.url", "druid.emitter.http.url"),
new Rename("com.metamx.emitter.http.url", "druid.emitter.http.recipientBaseUrl"),
new Rename("com.metamx.druid.emitter.period", "druid.emitter.emissionPeriod"),
new PrefixRename("com.metamx.emitter", "druid.emitter"),
new PrefixRename("com.metamx.druid.emitter", "druid.emitter"),
......@@ -108,6 +108,7 @@ public class ConvertProperties implements Runnable
new Rename("druid.worker.taskActionClient.retry.minWaitMillis", "druid.worker.taskActionClient.retry.minWait"),
new Rename("druid.worker.taskActionClient.retry.maxWaitMillis", "druid.worker.taskActionClient.retry.maxWait"),
new Rename("druid.master.merger.service", "druid.selectors.indexing.serviceName"),
new Rename("druid.master.period.segmentMerger", "druid.coordinator.period.indexingPeriod"),
new Rename("druid.master.merger.on", "druid.coordinator.merge.on"),
new PrefixRename("druid.master", "druid.coordinator"),
new PrefixRename("druid.pusher", "druid.storage"),
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册