提交 476ce867 编写于 作者: I Igal Levy

Merge branch 'master' into igalDruid

......@@ -30,4 +30,4 @@ echo "For examples, see: "
echo " "
ls -1 examples/*/*sh
echo " "
echo "See also http://druid.io/docs/0.6.26"
echo "See also http://druid.io/docs/0.6.30"
......@@ -28,7 +28,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
</parent>
<dependencies>
......@@ -53,4 +53,20 @@
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
......@@ -28,7 +28,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
</parent>
<dependencies>
......@@ -168,7 +168,15 @@
</goals>
</execution>
</executions>
</plugin>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
......@@ -3,7 +3,7 @@ layout: doc_page
---
# Booting a Single Node Cluster #
[Loading Your Data](Tutorial%3A-Loading-Your-Data-Part-2.html) and [All About Queries](Tutorial%3A-All-About-Queries.html) contain recipes to boot a small druid cluster on localhost. Here we will boot a small cluster on EC2. You can checkout the code, or download a tarball from [here](http://static.druid.io/artifacts/druid-services-0.6.26-bin.tar.gz).
[Loading Your Data](Tutorial%3A-Loading-Your-Data-Part-2.html) and [All About Queries](Tutorial%3A-All-About-Queries.html) contain recipes to boot a small druid cluster on localhost. Here we will boot a small cluster on EC2. You can checkout the code, or download a tarball from [here](http://static.druid.io/artifacts/druid-services-0.6.30-bin.tar.gz).
The [ec2 run script](https://github.com/metamx/druid/blob/master/examples/bin/run_ec2.sh), run_ec2.sh, is located at 'examples/bin' if you have checked out the code, or at the root of the project if you've downloaded a tarball. The scripts rely on the [Amazon EC2 API Tools](http://aws.amazon.com/developertools/351), and you will need to set three environment variables:
......
......@@ -16,7 +16,7 @@ More definitions are available on the [design page](Design.html).
* **Dimensions** Aspects or categories of data, such as languages or locations. For example, with *language* and *country* as the type of dimension, values could be "English" or "Mandarin" for language, or "USA" or "China" for country. In Druid, dimensions can serve as filters for narrowing down hits (for example, language = "English" or country = "China").
* **Ephemeral Node** A Zookeeper node (or "znode") that exists only for the time it is needed to complete the process for which it was created. In a Druid cluster, ephemeral nodes are typically used in work such as assigning [segments](#segment) to certain nodes.
* **Ephemeral Node** A Zookeeper node (or "znode") that exists for as long as the session that created the znode is active. More info [here](http://zookeeper.apache.org/doc/r3.2.1/zookeeperProgrammers.html#Ephemeral+Nodes). In a Druid cluster, ephemeral nodes are typically used for commands (such as assigning [segments](#segment) to certain nodes).
* **Granularity** The time interval corresponding to aggregation by time. Druid configuration settings specify the granularity of [timestamp](#timestamp) buckets in a [segment](#segment) (for example, by minute or by hour), as well as the granularity of the segment itself. The latter is essentially the overall range of absolute time covered by the segment. In queries, granularity settings control the summarization of findings.
......
......@@ -19,13 +19,13 @@ Clone Druid and build it:
git clone https://github.com/metamx/druid.git druid
cd druid
git fetch --tags
git checkout druid-0.6.26
git checkout druid-0.6.30
./build.sh
```
### Downloading the DSK (Druid Standalone Kit)
[Download](http://static.druid.io/artifacts/releases/druid-services-0.6.26-bin.tar.gz) a stand-alone tarball and run it:
[Download](http://static.druid.io/artifacts/releases/druid-services-0.6.30-bin.tar.gz) a stand-alone tarball and run it:
``` bash
tar -xzf druid-services-0.X.X-bin.tar.gz
......
......@@ -27,7 +27,7 @@ druid.host=localhost
druid.service=realtime
druid.port=8083
druid.extensions.coordinates=["io.druid.extensions:druid-kafka-seven:0.6.26"]
druid.extensions.coordinates=["io.druid.extensions:druid-kafka-seven:0.6.30"]
druid.zk.service.host=localhost
......
......@@ -49,7 +49,7 @@ There are two ways to setup Druid: download a tarball, or [Build From Source](Bu
### Download a Tarball
We've built a tarball that contains everything you'll need. You'll find it [here](http://static.druid.io/artifacts/releases/druid-services-0.6.26-bin.tar.gz). Download this file to a directory of your choosing.
We've built a tarball that contains everything you'll need. You'll find it [here](http://static.druid.io/artifacts/releases/druid-services-0.6.30-bin.tar.gz). Download this file to a directory of your choosing.
You can extract the awesomeness within by issuing:
......@@ -60,7 +60,7 @@ tar -zxvf druid-services-*-bin.tar.gz
Not too lost so far right? That's great! If you cd into the directory:
```
cd druid-services-0.6.26
cd druid-services-0.6.30
```
You should see a bunch of files:
......
......@@ -44,7 +44,7 @@ With real-world data, we recommend having a message bus such as [Apache Kafka](h
#### Setting up Kafka
[KafkaFirehoseFactory](https://github.com/metamx/druid/blob/druid-0.6.26/realtime/src/main/java/com/metamx/druid/realtime/firehose/KafkaFirehoseFactory.java) is how druid communicates with Kafka. Using this [Firehose](Firehose.html) with the right configuration, we can import data into Druid in real-time without writing any code. To load data to a real-time node via Kafka, we'll first need to initialize Zookeeper and Kafka, and then configure and initialize a [Realtime](Realtime.html) node.
[KafkaFirehoseFactory](https://github.com/metamx/druid/blob/druid-0.6.30/realtime/src/main/java/com/metamx/druid/realtime/firehose/KafkaFirehoseFactory.java) is how druid communicates with Kafka. Using this [Firehose](Firehose.html) with the right configuration, we can import data into Druid in real-time without writing any code. To load data to a real-time node via Kafka, we'll first need to initialize Zookeeper and Kafka, and then configure and initialize a [Realtime](Realtime.html) node.
Instructions for booting a Zookeeper and then Kafka cluster are available [here](http://kafka.apache.org/07/quickstart.html).
......
......@@ -13,7 +13,7 @@ In this tutorial, we will set up other types of Druid nodes as well as and exter
If you followed the first tutorial, you should already have Druid downloaded. If not, let's go back and do that first.
You can download the latest version of druid [here](http://static.druid.io/artifacts/releases/druid-services-0.6.26-bin.tar.gz)
You can download the latest version of druid [here](http://static.druid.io/artifacts/releases/druid-services-0.6.30-bin.tar.gz)
and untar the contents within by issuing:
......@@ -149,7 +149,7 @@ druid.port=8081
druid.zk.service.host=localhost
druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.26"]
druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.30"]
# Dummy read only AWS account (used to download example data)
druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b
......@@ -238,7 +238,7 @@ druid.port=8083
druid.zk.service.host=localhost
druid.extensions.coordinates=["io.druid.extensions:druid-examples:0.6.26","io.druid.extensions:druid-kafka-seven:0.6.26"]
druid.extensions.coordinates=["io.druid.extensions:druid-examples:0.6.30","io.druid.extensions:druid-kafka-seven:0.6.30"]
# Change this config to db to hand off to the rest of the Druid cluster
druid.publish.type=noop
......
......@@ -37,7 +37,7 @@ There are two ways to setup Druid: download a tarball, or [Build From Source](Bu
h3. Download a Tarball
We've built a tarball that contains everything you'll need. You'll find it [here](http://static.druid.io/artifacts/releases/druid-services-0.6.26-bin.tar.gz)
We've built a tarball that contains everything you'll need. You'll find it [here](http://static.druid.io/artifacts/releases/druid-services-0.6.30-bin.tar.gz)
Download this file to a directory of your choosing.
You can extract the awesomeness within by issuing:
......@@ -48,7 +48,7 @@ tar zxvf druid-services-*-bin.tar.gz
Not too lost so far right? That's great! If you cd into the directory:
```
cd druid-services-0.6.26
cd druid-services-0.6.30
```
You should see a bunch of files:
......
......@@ -9,7 +9,7 @@ There are two ways to setup Druid: download a tarball, or build it from source.
h3. Download a Tarball
We've built a tarball that contains everything you'll need. You'll find it "here":http://static.druid.io/artifacts/releases/druid-services-0.6.26-bin.tar.gz.
We've built a tarball that contains everything you'll need. You'll find it "here":http://static.druid.io/artifacts/releases/druid-services-0.6.30-bin.tar.gz.
Download this bad boy to a directory of your choosing.
You can extract the awesomeness within by issuing:
......
......@@ -4,7 +4,7 @@ druid.port=8081
druid.zk.service.host=localhost
druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.26"]
druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.30"]
# Dummy read only AWS account (used to download example data)
druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b
......
......@@ -4,7 +4,7 @@ druid.port=8083
druid.zk.service.host=localhost
druid.extensions.coordinates=["io.druid.extensions:druid-examples:0.6.26","io.druid.extensions:druid-kafka-seven:0.6.26","io.druid.extensions:druid-rabbitmq:0.6.26"]
druid.extensions.coordinates=["io.druid.extensions:druid-examples:0.6.30","io.druid.extensions:druid-kafka-seven:0.6.30","io.druid.extensions:druid-rabbitmq:0.6.30"]
# Change this config to db to hand off to the rest of the Druid cluster
druid.publish.type=noop
......
......@@ -28,7 +28,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
</parent>
<dependencies>
......@@ -104,6 +104,14 @@
</goals>
</execution>
</executions>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
......
......@@ -28,7 +28,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
</parent>
<dependencies>
......@@ -71,4 +71,20 @@
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
......@@ -28,7 +28,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
</parent>
<dependencies>
......@@ -101,6 +101,17 @@
<build>
<plugins>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
<plugin>
<artifactId>maven-shade-plugin</artifactId>
<executions>
......
......@@ -28,7 +28,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
</parent>
<dependencies>
......@@ -160,4 +160,20 @@
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
......@@ -23,22 +23,27 @@ import com.google.inject.Binder;
import com.google.inject.Key;
import com.google.inject.Module;
import com.google.inject.multibindings.MapBinder;
import io.druid.indexing.common.config.FileTaskLogsConfig;
import io.druid.indexing.common.tasklogs.FileTaskLogs;
import io.druid.tasklogs.NoopTaskLogs;
import io.druid.tasklogs.TaskLogPusher;
import io.druid.tasklogs.TaskLogs;
/**
*/
public class TaskLogsModule implements Module
public class IndexingServiceTaskLogsModule implements Module
{
@Override
public void configure(Binder binder)
{
PolyBind.createChoice(binder, "druid.indexer.logs.type", Key.get(TaskLogs.class), Key.get(NoopTaskLogs.class));
PolyBind.createChoice(binder, "druid.indexer.logs.type", Key.get(TaskLogs.class), Key.get(FileTaskLogs.class));
final MapBinder<String, TaskLogs> taskLogBinder = Binders.taskLogsBinder(binder);
taskLogBinder.addBinding("noop").to(NoopTaskLogs.class).in(LazySingleton.class);
taskLogBinder.addBinding("file").to(FileTaskLogs.class).in(LazySingleton.class);
binder.bind(NoopTaskLogs.class).in(LazySingleton.class);
binder.bind(FileTaskLogs.class).in(LazySingleton.class);
JsonConfigProvider.bind(binder, "druid.indexer.logs", FileTaskLogsConfig.class);
binder.bind(TaskLogPusher.class).to(TaskLogs.class);
}
......
/*
* Druid - a distributed column store.
* Copyright (C) 2012, 2013 Metamarkets Group Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/
package io.druid.indexing.common.config;
import com.fasterxml.jackson.annotation.JsonProperty;
import javax.validation.constraints.NotNull;
import java.io.File;
public class FileTaskLogsConfig
{
@JsonProperty
@NotNull
private File directory = new File("log");
public File getDirectory()
{
return directory;
}
}
/*
* Druid - a distributed column store.
* Copyright (C) 2012, 2013 Metamarkets Group Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/
package io.druid.indexing.common.tasklogs;
import com.google.common.base.Optional;
import com.google.common.io.ByteStreams;
import com.google.common.io.Files;
import com.google.common.io.InputSupplier;
import com.google.inject.Inject;
import com.metamx.common.logger.Logger;
import io.druid.indexing.common.config.FileTaskLogsConfig;
import io.druid.tasklogs.TaskLogs;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
public class FileTaskLogs implements TaskLogs
{
private static final Logger log = new Logger(FileTaskLogs.class);
private final FileTaskLogsConfig config;
@Inject
public FileTaskLogs(
FileTaskLogsConfig config
)
{
this.config = config;
}
@Override
public void pushTaskLog(final String taskid, File file) throws IOException
{
if (!config.getDirectory().exists()) {
config.getDirectory().mkdir();
}
final File outputFile = fileForTask(taskid);
Files.copy(file, outputFile);
log.info("Wrote task log to: %s", outputFile);
}
@Override
public Optional<InputSupplier<InputStream>> streamTaskLog(final String taskid, final long offset) throws IOException
{
final File file = fileForTask(taskid);
if (file.exists()) {
return Optional.<InputSupplier<InputStream>>of(
new InputSupplier<InputStream>()
{
@Override
public InputStream getInput() throws IOException
{
final InputStream inputStream = new FileInputStream(file);
ByteStreams.skipFully(inputStream, offset);
return inputStream;
}
}
);
} else {
return Optional.absent();
}
}
private File fileForTask(final String taskid)
{
return new File(config.getDirectory(), String.format("%s.log", taskid));
}
}
......@@ -220,7 +220,7 @@ public class DbTaskStorage implements TaskStorage
}
@Override
public List<Task> getRunningTasks()
public List<Task> getActiveTasks()
{
return dbi.withHandle(
new HandleCallback<List<Task>>()
......
......@@ -128,7 +128,7 @@ public class HeapMemoryTaskStorage implements TaskStorage
}
@Override
public List<Task> getRunningTasks()
public List<Task> getActiveTasks()
{
giant.lock();
......
......@@ -100,7 +100,7 @@ public class TaskQueue
// Get all running tasks and their locks
final Multimap<TaskLock, Task> tasksByLock = ArrayListMultimap.create();
for (final Task task : taskStorage.getRunningTasks()) {
for (final Task task : taskStorage.getActiveTasks()) {
try {
final List<TaskLock> taskLocks = taskStorage.getLocks(task.getId());
......
......@@ -77,9 +77,9 @@ public interface TaskStorage
public List<TaskAction> getAuditLogs(String taskid);
/**
* Returns a list of currently-running tasks as stored in the storage facility, in no particular order.
* Returns a list of currently running or pending tasks as stored in the storage facility, in no particular order.
*/
public List<Task> getRunningTasks();
public List<Task> getActiveTasks();
/**
* Returns a list of locks for a particular task.
......
......@@ -38,7 +38,7 @@ public class WorkerConfig
@JsonProperty
@Min(1)
private int capacity = Runtime.getRuntime().availableProcessors() - 1;
private int capacity = Math.max(1, Runtime.getRuntime().availableProcessors() - 1);
public String getIp()
{
......
......@@ -28,7 +28,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
</parent>
<dependencies>
......@@ -116,4 +116,20 @@
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
......@@ -28,7 +28,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
</parent>
<dependencies>
......@@ -55,4 +55,19 @@
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
......@@ -23,7 +23,7 @@
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<packaging>pom</packaging>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
<name>druid</name>
<description>druid</description>
<scm>
......@@ -41,6 +41,7 @@
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<metamx.java-util.version>0.25.0</metamx.java-util.version>
<apache.curator.version>2.1.0-incubating</apache.curator.version>
<druid.api.version>0.1.5</druid.api.version>
</properties>
<modules>
......@@ -65,7 +66,7 @@
<dependency>
<groupId>io.druid</groupId>
<artifactId>druid-api</artifactId>
<version>0.1.3</version>
<version>${druid.api.version}</version>
</dependency>
<!-- Compile Scope -->
......
......@@ -28,7 +28,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
</parent>
<dependencies>
......@@ -133,6 +133,14 @@
</goals>
</execution>
</executions>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
......
......@@ -51,7 +51,8 @@ public class JavascriptDimExtractionFn implements DimExtractionFn
cx = contextFactory.enterContext();
}
return Context.toString(fn.call(cx, scope, scope, new String[]{input}));
final Object res = fn.call(cx, scope, scope, new String[]{input});
return res != null ? Context.toString(res) : null;
}
};
}
......
......@@ -20,6 +20,7 @@
package io.druid.query.extraction.extraction;
import com.google.common.collect.Iterators;
import com.google.common.collect.Lists;
import io.druid.query.extraction.DimExtractionFn;
import io.druid.query.extraction.JavascriptDimExtractionFn;
import org.junit.Assert;
......@@ -52,6 +53,21 @@ public class JavascriptDimExtractionFnTest
}
}
@Test
public void testCastingAndNull()
{
String function = "function(x) {\n x = Number(x);\n if(isNaN(x)) return null;\n return Math.floor(x / 5) * 5;\n}";
DimExtractionFn dimExtractionFn = new JavascriptDimExtractionFn(function);
Iterator<String> it = Iterators.forArray("0", "5", "5", "10", null);
for(String str : Lists.newArrayList("1", "5", "6", "10", "CA")) {
String res = dimExtractionFn.apply(str);
String expected = it.next();
Assert.assertEquals(expected, res);
}
}
@Test
public void testJavascriptRegex()
{
......
......@@ -144,14 +144,14 @@ applications \cite{tschetter2011druid}. In the early days of Metamarkets, we
were focused on building a hosted dashboard that would allow users to arbitrary
explore and visualize event streams. The data store powering the dashboard
needed to return queries fast enough that the data visualizations built on top
of it could update provide users with an interactive experience.
of it could provide users with an interactive experience.
In addition to the query latency needs, the system had to be multi-tenant and
highly available. The Metamarkets product is used in a highly concurrent
environment. Downtime is costly and many businesses cannot afford to wait if a
system is unavailable in the face of software upgrades or network failure.
Downtime for startups, who often do not have internal operations teams, can
determine whether a business succeeds or fails.
Downtime for startups, who often lack proper internal operations management, can
determine business success or failure.
Finally, another key problem that Metamarkets faced in its early days was to
allow users and alerting systems to be able to make business decisions in
......@@ -170,15 +170,15 @@ analytics platform in multiple companies.
\label{sec:architecture}
A Druid cluster consists of different types of nodes and each node type is
designed to perform a specific set of things. We believe this design separates
concerns and simplifies the complexity of the system. There is minimal
interaction between the different node types and hence, intra-cluster
communication failures have minimal impact on data availability. The different
node types operate fairly independent of each other and to solve complex data
analysis problems, they come together to form a fully working system.
The name Druid comes from the Druid class in many role-playing games: it is a
shape-shifter, capable of taking on many different forms to fulfill various
different roles in a group. The composition of and flow of data in a Druid
cluster are shown in Figure~\ref{fig:cluster}.
concerns and simplifies the complexity of the system. The different node types
operate fairly independent of each other and there is minimal interaction
between them. Hence, intra-cluster communication failures have minimal impact
on data availability. To solve complex data analysis problems, the different
node types come together to form a fully working system. The name Druid comes
from the Druid class in many role-playing games: it is a shape-shifter, capable
of taking on many different forms to fulfill various different roles in a
group. The composition of and flow of data in a Druid cluster are shown in
Figure~\ref{fig:cluster}.
\begin{figure*}
\centering
......@@ -213,10 +213,10 @@ still be queried. Figure~\ref{fig:realtime_flow} illustrates the process.
\begin{figure}
\centering
\includegraphics[width = 2.8in]{realtime_flow}
\caption{Real-time nodes first buffer events in memory. After some period of
time, in-memory indexes are persisted to disk. After another period of time,
all persisted indexes are merged together and handed off. Queries on data hit
the in-memory index and the persisted indexes.}
\caption{Real-time nodes first buffer events in memory. On a periodic basis,
the in-memory index is persisted to disk. On another periodic basis, all
persisted indexes are merged together and handed off. Queries for data will hit the
in-memory index and the persisted indexes.}
\label{fig:realtime_flow}
\end{figure}
......@@ -325,14 +325,14 @@ serves whatever data it finds.
\begin{figure}
\centering
\includegraphics[width = 2.8in]{historical_download}
\caption{Historical nodes download immutable segments from deep storage.}
\includegraphics[width = 2.6in]{historical_download}
\caption{Historical nodes download immutable segments from deep storage. Segments must be loaded in memory before they can be queried.}
\label{fig:historical_download}
\end{figure}
Historical nodes can support read consistency because they only deal with
immutable data. Immutable data blocks also enable a simple parallelization
model: historical nodes can scan and aggregate immutable blocks concurrently
model: historical nodes can concurrently scan and aggregate immutable blocks
without blocking.
\subsubsection{Tiers}
......@@ -385,7 +385,7 @@ caching the results would be unreliable.
\includegraphics[width = 4.5in]{caching}
\caption{Broker nodes cache per segment results. Every Druid query is mapped to
a set of segments. Queries often combine cached segment results with those that
need tobe computed on historical and real-time nodes.}
need to be computed on historical and real-time nodes.}
\label{fig:caching}
\end{figure*}
......@@ -399,7 +399,7 @@ nodes are unable to communicate to Zookeeper, they use their last known view of
the cluster and continue to forward queries to real-time and historical nodes.
Broker nodes make the assumption that the structure of the cluster is the same
as it was before the outage. In practice, this availability model has allowed
our Druid cluster to continue serving queries for several hours while we
our Druid cluster to continue serving queries for a significant period of time while we
diagnosed Zookeeper outages.
\subsection{Coordinator Nodes}
......@@ -564,9 +564,9 @@ In this case, we compress the raw values as opposed to their dictionary
representations.
\subsection{Indices for Filtering Data}
In most real world OLAP workflows, queries are issued for the aggregated
results for some set of metrics where some set of dimension specifications are
met. An example query may ask "How many Wikipedia edits were done by users in
In many real world OLAP workflows, queries are issued for the aggregated
results of some set of metrics where some set of dimension specifications are
met. An example query may be asked is: "How many Wikipedia edits were done by users in
San Francisco who are also male?". This query is filtering the Wikipedia data
set in Table~\ref{tab:sample_data} based on a Boolean expression of dimension
values. In many real world data sets, dimension columns contain strings and
......@@ -712,7 +712,7 @@ equal to "Ke\$ha". The results will be bucketed by day and will be a JSON array
Druid supports many types of aggregations including double sums, long sums,
minimums, maximums, and several others. Druid also supports complex aggregations
such as cardinality estimation and approxmiate quantile estimation. The
such as cardinality estimation and approximate quantile estimation. The
results of aggregations can be combined in mathematical expressions to form
other aggregations. The query API is highly customizable and can be extended to
filter and group results based on almost any arbitrary condition. It is beyond
......@@ -892,10 +892,9 @@ support computation directly in the storage layer. There are also other data
stores designed for some of the same of the data warehousing issues that Druid
is meant to solve. These systems include include in-memory databases such as
SAP’s HANA \cite{farber2012sap} and VoltDB \cite{voltdb2010voltdb}. These data
stores lack Druid's low latency ingestion characteristics. Similar to
\cite{paraccel2013}, Druid has analytical features built in, however, it is
much easier to do system wide rolling software updates in Druid (with no
downtime).
stores lack Druid's low latency ingestion characteristics. Druid also has
native analytical features baked in, similar to \cite{paraccel2013}, however,
Druid allows system wide rolling software updates with no downtime.
Druid's low latency data ingestion features share some similarities with
Trident/Storm \cite{marz2013storm} and Streaming Spark
......
......@@ -9,7 +9,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
</parent>
<dependencies>
......
......@@ -28,7 +28,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
</parent>
<dependencies>
......@@ -66,4 +66,19 @@
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
......@@ -34,7 +34,7 @@ import io.druid.data.input.impl.FileIteratingFirehose;
import io.druid.data.input.impl.StringInputRowParser;
import org.apache.commons.io.IOUtils;
import org.apache.commons.io.LineIterator;
import org.jets3t.service.S3Service;
import org.jets3t.service.impl.rest.httpclient.RestS3Service;
import org.jets3t.service.model.S3Bucket;
import org.jets3t.service.model.S3Object;
......@@ -55,13 +55,13 @@ public class StaticS3FirehoseFactory implements FirehoseFactory
{
private static final Logger log = new Logger(StaticS3FirehoseFactory.class);
private final S3Service s3Client;
private final RestS3Service s3Client;
private final StringInputRowParser parser;
private final List<URI> uris;
@JsonCreator
public StaticS3FirehoseFactory(
@JacksonInject("s3Client") S3Service s3Client,
@JacksonInject("s3Client") RestS3Service s3Client,
@JsonProperty("parser") StringInputRowParser parser,
@JsonProperty("uris") List<URI> uris
)
......
......@@ -28,7 +28,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
</parent>
<dependencies>
......@@ -220,8 +220,6 @@
<artifactId>caliper</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
......@@ -235,6 +233,14 @@
</goals>
</execution>
</executions>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
<plugin>
<groupId>org.antlr</groupId>
......
......@@ -52,7 +52,6 @@ import io.druid.guice.QueryableModule;
import io.druid.guice.ServerModule;
import io.druid.guice.ServerViewModule;
import io.druid.guice.StorageNodeModule;
import io.druid.guice.TaskLogsModule;
import io.druid.guice.annotations.Client;
import io.druid.guice.annotations.Json;
import io.druid.guice.annotations.Smile;
......@@ -299,7 +298,6 @@ public class Initialization
new JacksonConfigManagerModule(),
new IndexingServiceDiscoveryModule(),
new DataSegmentPusherPullerModule(),
new TaskLogsModule(),
new FirehoseModule()
);
......
......@@ -20,10 +20,16 @@
package io.druid.server;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.google.inject.Injector;
import io.druid.initialization.DruidModule;
import io.druid.initialization.Initialization;
import io.druid.server.initialization.ExtensionsConfig;
import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import java.util.ArrayList;
import java.util.List;
/**
*/
......@@ -33,20 +39,60 @@ public class StatusResource
@GET
@Produces("application/json")
public Status doGet()
{
return getStatus();
}
public static Status getStatus()
{
return new Status(
StatusResource.class.getPackage().getImplementationVersion(),
Initialization.class.getPackage().getImplementationVersion(),
getExtensionVersions(),
new Memory(Runtime.getRuntime())
);
}
public static class Status {
/**
* Load the unique extensions and return their implementation-versions
*
* @return map of extensions loaded with their respective implementation versions.
*/
private static List<ModuleVersion> getExtensionVersions()
{
final Injector injector = Initialization.makeStartupInjector();
final ExtensionsConfig config = injector.getInstance(ExtensionsConfig.class);
final List<DruidModule> druidModules = Initialization.getFromExtensions(config, DruidModule.class);
List<ModuleVersion> moduleVersions = new ArrayList<>();
for (DruidModule module : druidModules) {
String artifact = module.getClass().getPackage().getImplementationTitle();
String version = module.getClass().getPackage().getImplementationVersion();
ModuleVersion moduleVersion;
if (artifact != null) {
moduleVersion = new ModuleVersion(module.getClass().getCanonicalName(), artifact, version);
} else {
moduleVersion = new ModuleVersion(module.getClass().getCanonicalName());
}
moduleVersions.add(moduleVersion);
}
return moduleVersions;
}
public static class Status
{
final String version;
final List<ModuleVersion> modules;
final Memory memory;
public Status(String version, Memory memory)
public Status(
String version, List<ModuleVersion> modules, Memory memory
)
{
this.version = version;
this.modules = modules;
this.memory = memory;
}
......@@ -56,20 +102,94 @@ public class StatusResource
return version;
}
@JsonProperty
public List<ModuleVersion> getModules()
{
return modules;
}
@JsonProperty
public Memory getMemory()
{
return memory;
}
@Override
public String toString()
{
final String NL = "\n";
StringBuilder output = new StringBuilder();
output.append(String.format("Druid version - %s", version)).append(NL).append(NL);
if (modules.size() > 0) {
output.append("Registered Druid Modules").append(NL);
} else {
output.append("No Druid Modules loaded !");
}
for (ModuleVersion moduleVersion : modules) {
output.append(moduleVersion).append(NL);
}
return output.toString();
}
}
public static class ModuleVersion
{
final String name;
final String artifact;
final String version;
public ModuleVersion(String name)
{
this(name, "", "");
}
public ModuleVersion(String name, String artifact, String version)
{
this.name = name;
this.artifact = artifact;
this.version = version;
}
@JsonProperty
public String getName()
{
return name;
}
@JsonProperty
public String getArtifact()
{
return artifact;
}
@JsonProperty
public String getVersion()
{
return version;
}
@Override
public String toString()
{
if (artifact.isEmpty()) {
return String.format(" - %s ", name);
} else {
return String.format(" - %s (%s-%s)", name, artifact, version);
}
}
}
public static class Memory {
public static class Memory
{
final long maxMemory;
final long totalMemory;
final long freeMemory;
final long usedMemory;
public Memory(Runtime runtime) {
public Memory(Runtime runtime)
{
maxMemory = runtime.maxMemory();
totalMemory = runtime.totalMemory();
freeMemory = runtime.freeMemory();
......@@ -99,5 +219,6 @@ public class StatusResource
{
return usedMemory;
}
}
}
......@@ -32,6 +32,9 @@
<body>
<div class="container">
<div>
<h2>Druid Version: ${pom.version} Druid API Version: ${druid.api.version}</h2>
</div>
<div>
<a href="view.html">View Information about the Cluster</a>
</div>
......
......@@ -29,6 +29,7 @@ import org.apache.curator.framework.api.CuratorEventType;
import org.apache.curator.framework.api.CuratorListener;
import org.apache.zookeeper.CreateMode;
import org.apache.zookeeper.Watcher;
import org.junit.After;
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;
......@@ -50,6 +51,12 @@ public class CuratorInventoryManagerTest extends io.druid.curator.CuratorTestBas
exec = Execs.singleThreaded("curator-inventory-manager-test-%s");
}
@After
public void tearDown() throws Exception
{
tearDownServerAndCurator();
}
@Test
public void testSanity() throws Exception
{
......
......@@ -27,7 +27,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.27-SNAPSHOT</version>
<version>0.6.31-SNAPSHOT</version>
</parent>
<dependencies>
......@@ -51,8 +51,20 @@
<version>${project.parent.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
......
......@@ -53,7 +53,7 @@ import java.util.List;
*/
@Command(
name = "broker",
description = "Runs a broker node, see http://druid.io/docs/0.6.26/Broker.html for a description"
description = "Runs a broker node, see http://druid.io/docs/0.6.30/Broker.html for a description"
)
public class CliBroker extends ServerRunnable
{
......
......@@ -63,7 +63,7 @@ import java.util.List;
*/
@Command(
name = "coordinator",
description = "Runs the Coordinator, see http://druid.io/docs/0.6.26/Coordinator.html for a description."
description = "Runs the Coordinator, see http://druid.io/docs/0.6.30/Coordinator.html for a description."
)
public class CliCoordinator extends ServerRunnable
{
......
......@@ -41,7 +41,7 @@ import java.util.List;
*/
@Command(
name = "hadoop",
description = "Runs the batch Hadoop Druid Indexer, see http://druid.io/docs/0.6.26/Batch-ingestion.html for a description."
description = "Runs the batch Hadoop Druid Indexer, see http://druid.io/docs/0.6.30/Batch-ingestion.html for a description."
)
public class CliHadoopIndexer implements Runnable
{
......
......@@ -42,7 +42,7 @@ import java.util.List;
*/
@Command(
name = "historical",
description = "Runs a Historical node, see http://druid.io/docs/0.6.26/Historical.html for a description"
description = "Runs a Historical node, see http://druid.io/docs/0.6.30/Historical.html for a description"
)
public class CliHistorical extends ServerRunnable
{
......
......@@ -28,6 +28,7 @@ import com.metamx.common.logger.Logger;
import io.airlift.command.Command;
import io.druid.guice.IndexingServiceFirehoseModule;
import io.druid.guice.IndexingServiceModuleHelper;
import io.druid.guice.IndexingServiceTaskLogsModule;
import io.druid.guice.Jerseys;
import io.druid.guice.JsonConfigProvider;
import io.druid.guice.LazySingleton;
......@@ -103,7 +104,8 @@ public class CliMiddleManager extends ServerRunnable
);
}
},
new IndexingServiceFirehoseModule()
new IndexingServiceFirehoseModule(),
new IndexingServiceTaskLogsModule()
);
}
}
......@@ -32,6 +32,7 @@ import com.metamx.common.logger.Logger;
import io.airlift.command.Command;
import io.druid.guice.IndexingServiceFirehoseModule;
import io.druid.guice.IndexingServiceModuleHelper;
import io.druid.guice.IndexingServiceTaskLogsModule;
import io.druid.guice.JacksonConfigProvider;
import io.druid.guice.Jerseys;
import io.druid.guice.JsonConfigProvider;
......@@ -70,6 +71,7 @@ import io.druid.indexing.overlord.scaling.ResourceManagementStrategy;
import io.druid.indexing.overlord.scaling.SimpleResourceManagementConfig;
import io.druid.indexing.overlord.scaling.SimpleResourceManagementStrategy;
import io.druid.indexing.overlord.setup.WorkerSetupData;
import io.druid.indexing.worker.config.WorkerConfig;
import io.druid.server.http.RedirectFilter;
import io.druid.server.http.RedirectInfo;
import io.druid.server.initialization.JettyServerInitializer;
......@@ -93,7 +95,7 @@ import java.util.List;
*/
@Command(
name = "overlord",
description = "Runs an Overlord node, see http://druid.io/docs/0.6.26/Indexing-Service.html for a description"
description = "Runs an Overlord node, see http://druid.io/docs/0.6.30/Indexing-Service.html for a description"
)
public class CliOverlord extends ServerRunnable
{
......@@ -166,6 +168,8 @@ public class CliOverlord extends ServerRunnable
private void configureRunners(Binder binder)
{
JsonConfigProvider.bind(binder, "druid.worker", WorkerConfig.class);
PolyBind.createChoice(
binder,
"druid.indexer.runner.type",
......@@ -208,7 +212,8 @@ public class CliOverlord extends ServerRunnable
JsonConfigProvider.bind(binder, "druid.indexer.autoscale", SimpleResourceManagementConfig.class);
}
},
new IndexingServiceFirehoseModule()
new IndexingServiceFirehoseModule(),
new IndexingServiceTaskLogsModule()
);
}
......
......@@ -30,7 +30,7 @@ import java.util.List;
*/
@Command(
name = "realtime",
description = "Runs a realtime node, see http://druid.io/docs/0.6.26/Realtime.html for a description"
description = "Runs a realtime node, see http://druid.io/docs/0.6.30/Realtime.html for a description"
)
public class CliRealtime extends ServerRunnable
{
......
......@@ -42,7 +42,7 @@ import java.util.concurrent.Executor;
*/
@Command(
name = "realtime",
description = "Runs a standalone realtime node for examples, see http://druid.io/docs/0.6.26/Realtime.html for a description"
description = "Runs a standalone realtime node for examples, see http://druid.io/docs/0.6.30/Realtime.html for a description"
)
public class CliRealtimeExample extends ServerRunnable
{
......
......@@ -27,6 +27,8 @@ import io.druid.cli.convert.ConvertProperties;
import io.druid.cli.validate.DruidJsonValidator;
import io.druid.initialization.Initialization;
import io.druid.server.initialization.ExtensionsConfig;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;
import java.util.List;
......@@ -41,7 +43,7 @@ public class Main
builder.withDescription("Druid command-line runner.")
.withDefaultCommand(Help.class)
.withCommands(Help.class);
.withCommands(Help.class, Version.class);
builder.withGroup("server")
.withDescription("Run one of the Druid server types.")
......
/*
* Druid - a distributed column store.
* Copyright (C) 2012, 2013 Metamarkets Group Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/
package io.druid.cli;
import io.airlift.command.Command;
import io.druid.server.StatusResource;
@Command(
name = "version",
description = "Returns Druid version information"
)
public class Version implements Runnable
{
@Override
public void run()
{
System.out.println(StatusResource.getStatus());
}
}
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册