diff --git a/docs/content/ingestion/index.md b/docs/content/ingestion/index.md index 1529455ff32a6ff7408540c749f61b467a8d03e6..02fa1cc548e9e9e6f433084059df8ef0a105e1ec 100644 --- a/docs/content/ingestion/index.md +++ b/docs/content/ingestion/index.md @@ -87,7 +87,8 @@ An example dataSchema is shown below: "segmentGranularity" : "DAY", "queryGranularity" : "NONE", "intervals" : [ "2013-08-31/2013-09-01" ] - } + }, + "transformSpec" : null } ``` @@ -97,6 +98,7 @@ An example dataSchema is shown below: | parser | JSON Object | Specifies how ingested data can be parsed. | yes | | metricsSpec | JSON Object array | A list of [aggregators](../querying/aggregations.html). | yes | | granularitySpec | JSON Object | Specifies how to create segments and roll up data. | yes | +| transformSpec | JSON Object | Specifes how to filter and transform input data. See [transform specs](../ingestion/transform-spec.html).| no | ## Parser @@ -244,6 +246,9 @@ for the `comment` column. } ``` +## metricsSpec + The `metricsSpec` is a list of [aggregators](../querying/aggregations.html). If `rollup` is false in the granularity spec, the metrics spec should be an empty list and all columns should be defined in the `dimensionsSpec` instead (without rollup, there isn't a real distinction between dimensions and metrics at ingestion time). This is optional, however. + ## GranularitySpec The default granularity spec is `uniform`, and can be changed by setting the `type` field. @@ -270,6 +275,10 @@ This spec is used to generate segments with arbitrary intervals (it tries to cre | rollup | boolean | rollup or not | no (default == true) | | intervals | string | A list of intervals for the raw data being ingested. Ignored for real-time ingestion. | no. If specified, batch ingestion tasks may skip determining partitions phase which results in faster ingestion. | +# Transform Spec + +Transform specs allow Druid to transform and filter input data during ingestion. See [Transform specs](../ingestion/transform-spec.html) + # IO Config Stream Push Ingestion: Stream push ingestion with Tranquility does not require an IO Config. diff --git a/docs/content/ingestion/transform-spec.md b/docs/content/ingestion/transform-spec.md new file mode 100644 index 0000000000000000000000000000000000000000..eedaaa6950bab29be24b9eeabc9f236ca11d0bd8 --- /dev/null +++ b/docs/content/ingestion/transform-spec.md @@ -0,0 +1,84 @@ +--- +layout: doc_page +--- + +# Transform Specs + +Transform specs allow Druid to filter and transform input data during ingestion. + +## Syntax + +The syntax for the transformSpec is shown below: + +``` +"transformSpec": { + "transforms: , + "filter": +} +``` + +|property|description|required?| +|--------|-----------|---------| +|transforms|A list of [transforms](#transforms) to be applied to input rows. |no| +|filter|A [filter](../querying/filters.html) that will be applied to input rows; only rows that pass the filter will be ingested.|no| + +## Transforms + +The `transforms` list allows the user to specify a set of column transformations to be performed on input data. + +Transforms allow adding new fields to input rows. Each transform has a "name" (the name of the new field) which can be referred to by DimensionSpecs, AggregatorFactories, etc. + +A transform behaves as a "row function", taking an entire row as input and outputting a column value. + +If a transform has the same name as a field in an input row, then it will shadow the original field. Transforms that shadow fields may still refer to the fields they shadow. This can be used to transform a field "in-place". + +Transforms do have some limitations. They can only refer to fields present in the actual input rows; in particular, they cannot refer to other transforms. And they cannot remove fields, only add them. However, they can shadow a field with another field containing all nulls, which will act similarly to removing the field. + +Note that the transforms are applied before the filter. + +### Expression Transform + +Druid currently supports one kind of transform, the expression transform. + +An expression transform has the following syntax: + +``` +{ + "type": "expression", + "name": , + "expression": +} +``` + +|property|description|required?| +|--------|-----------|---------| +|name|The output field name of the expression transform.|yes| +|expression|An [expression](../misc/math-expr.html) that will be applied to input rows to produce a value for the transform's output field.|no| + +For example, the following expression transform prepends "foo" to the values of a `page` column in the input data, and creates a `fooPage` column. + +``` + { + "type": "expression", + "name": "fooPage", + "expression": "concat('foo' + page)" + } +``` + +## Filtering + +The transformSpec allows Druid to filter out input rows during ingestion. A row that fails to pass the filter will not be ingested. + +Any of Druid's standard [filters](../querying/filters.html) can be used. + +Note that the filtering takes place after the transforms, so filters will operate on transformed rows and not the raw input data if transforms are present. + +For example, the following filter would ingest only input rows where a `country` column has the value "United States": + +``` +"filter": { + "type": "selector", + "dimension": "country", + "value": "United States" +} +``` \ No newline at end of file diff --git a/docs/content/misc/math-expr.md b/docs/content/misc/math-expr.md index abcebdd3b5e347a0ce2008c4341fa871d88dc59c..d8214916c2290aa4647762faa345343e2ac7019b 100644 --- a/docs/content/misc/math-expr.md +++ b/docs/content/misc/math-expr.md @@ -2,6 +2,12 @@ layout: doc_page --- +# Druid Expressions + +
+This feature is still experimental. It has not been optimized for performance yet, and its implementation is known to have significant inefficiencies. +
+ This expression language supports the following operators (listed in decreasing order of precedence). |Operators|Description| diff --git a/docs/content/querying/virtual-columns.md b/docs/content/querying/virtual-columns.md new file mode 100644 index 0000000000000000000000000000000000000000..117b75ea55919eb4c6b0457736e956e9337de67c --- /dev/null +++ b/docs/content/querying/virtual-columns.md @@ -0,0 +1,60 @@ +--- +layout: doc_page +--- + +# Virtual Columns + +Virtual columns are queryable column "views" created from a set of columns during a query. + +A virtual column can potentially draw from multiple underlying columns, although a virtual column always presents itself as a single column. + +Virtual columns can be used as dimensions or as inputs to aggregators. + +Each Druid query can accept a list of virtual columns as a parameter. The following scan query is provided as an example: + +``` +{ + "queryType": "scan", + "dataSource": "page_data", + "columns":[], + "virtualColumns": [ + { + "type": "expression", + "name": "fooPage", + "expression": "concat('foo' + page)", + "outputType": "STRING" + }, + { + "type": "expression", + "name": "tripleWordCount", + "expression": "wordCount * 3", + "outputType": "LONG" + } + ], + "intervals": [ + "2013-01-01/2019-01-02" + ] +} +``` + + +## Virtual Column Types + +### Expression virtual column + +The expression virtual column has the following syntax: + +``` +{ + "type": "expression", + "name": , + "expression": , + "outputType": +} +``` + +|property|description|required?| +|--------|-----------|---------| +|name|The name of the virtual column.|yes| +|expression|An [expression](../misc/math-expr.html) that takes a row as input and outputs a value for the virtual column.|yes| +|outputType|The expression's output will be coerced to this type. Can be LONG, FLOAT, DOUBLE, or STRING.|no, default is FLOAT| \ No newline at end of file diff --git a/docs/content/toc.md b/docs/content/toc.md index 6553e9676e7774e12d8a7518f1207f18bde3c159..a8cd7ed2d75b0fcd4b74563952a6eea93882bebd 100644 --- a/docs/content/toc.md +++ b/docs/content/toc.md @@ -32,6 +32,7 @@ layout: toc * [Stream Pull](/docs/VERSION/ingestion/stream-pull.html) * [Updating Existing Data](/docs/VERSION/ingestion/update-existing-data.html) * [Ingestion Tasks](/docs/VERSION/ingestion/tasks.html) + * [Transform Specs](/docs/VERSION/ingestion/transform-spec.html) * [FAQ](/docs/VERSION/ingestion/faq.html) ## Querying @@ -60,6 +61,7 @@ layout: toc * [Multitenancy](/docs/VERSION/querying/multitenancy.html) * [Caching](/docs/VERSION/querying/caching.html) * [Sorting Orders](/docs/VERSION/querying/sorting-orders.html) + * [Virtual Columns](/docs/VERSION/querying/virtual-columns.html) ## Design * [Overview](/docs/VERSION/design/design.html) @@ -127,5 +129,6 @@ layout: toc ## Misc + * [Druid Expressions Language](/docs/VERSION/misc/math-expr.html) * [Papers & Talks](/docs/VERSION/misc/papers-and-talks.html) * [Thanks](/thanks.html)