From 4ed9f868153376c9abab23afecac0485f5bf345d Mon Sep 17 00:00:00 2001 From: Lisa Owen Date: Mon, 14 Oct 2019 10:42:42 -0700 Subject: [PATCH] docs - aggregate-related updates (#8759) * docs - update pg_aggregate and related aggregate sql commands * planner supports ordered-set aggregates * add missing periods --- .../query/topics/functions-operators.xml | 3 +- .../topics/query-piv-opt-limitations.xml | 3 +- .../sql_commands/ALTER_AGGREGATE.xml | 47 ++++- .../sql_commands/ALTER_EXTENSION.xml | 6 +- .../dita/ref_guide/sql_commands/COMMENT.xml | 20 +- .../sql_commands/CREATE_AGGREGATE.xml | 48 +++-- .../ref_guide/sql_commands/DROP_AGGREGATE.xml | 49 ++++- .../system_catalogs/pg_aggregate.xml | 197 ++++++++++++++++-- 8 files changed, 297 insertions(+), 76 deletions(-) diff --git a/gpdb-doc/dita/admin_guide/query/topics/functions-operators.xml b/gpdb-doc/dita/admin_guide/query/topics/functions-operators.xml index de574e2f16..4f2778f1c2 100644 --- a/gpdb-doc/dita/admin_guide/query/topics/functions-operators.xml +++ b/gpdb-doc/dita/admin_guide/query/topics/functions-operators.xml @@ -783,8 +783,7 @@ SELECT foo(); Advanced Aggregate Functions

The following built-in advanced aggregate functions are Greenplum extensions of the - PostgreSQL database. These functions are immutable. Greenplum Database does - not support the PostgreSQL ordered-set aggregate functions. + PostgreSQL database. These functions are immutable. The Greenplum MADlib Extension for Analytics provides additional advanced functions to perform statistical analysis and machine learning with Greenplum Database data. See FIELDSELECT

  • Aggregate functions that take set operators as input arguments.
  • -
  • percentile_* window functions (Greenplum Database does not support - ordered-set aggregate functions).
  • +
  • percentile_* window functions (ordered-set aggregate functions).
  • Inverse distribution functions.
  • Queries that execute functions that are defined with the ON MASTER or ON ALL SEGMENTS attribute.
  • diff --git a/gpdb-doc/dita/ref_guide/sql_commands/ALTER_AGGREGATE.xml b/gpdb-doc/dita/ref_guide/sql_commands/ALTER_AGGREGATE.xml index 7040c8657b..e432ae48e0 100644 --- a/gpdb-doc/dita/ref_guide/sql_commands/ALTER_AGGREGATE.xml +++ b/gpdb-doc/dita/ref_guide/sql_commands/ALTER_AGGREGATE.xml @@ -1,11 +1,16 @@ -ALTER AGGREGATE

    Changes the definition of an aggregate function

    SynopsisALTER AGGREGATE name ( type [ , ... ] ) RENAME TO new_name +ALTER AGGREGATE

    Changes the definition of an aggregate function

    SynopsisALTER AGGREGATE name ( aggregate_signature ) RENAME TO new_name -ALTER AGGREGATE name ( type [ , ... ] ) OWNER TO new_owner +ALTER AGGREGATE name ( aggregate_signature ) OWNER TO new_owner -ALTER AGGREGATE name ( type [ , ... ] ) SET SCHEMA new_schema
    Description

    ALTER AGGREGATE changes the definition of an aggregate +ALTER AGGREGATE name ( aggregate_signature ) SET SCHEMA new_schema +

    where aggregate_signature is:

    + * | +[ argmode ] [ argname ] argtype [ , ... ] | +[ [ argmode ] [ argname ] argtype [ , ... ] ] ORDER BY [ argmode ] [ argname ] argtype [ , ... ] +
    Description

    ALTER AGGREGATE changes the definition of an aggregate function.

    You must own the aggregate function to use ALTER AGGREGATE. To change the schema of an aggregate function, you must also have CREATE privilege on the new schema. To alter the owner, you must also be a direct @@ -14,9 +19,39 @@ privilege on the aggregate function's schema. (These restrictions enforce that altering the owner does not do anything you could not do by dropping and recreating the aggregate function. However, a superuser can alter ownership of any aggregate function anyway.)

    ParametersnameThe name (optionally schema-qualified) of an existing aggregate function. -typeAn input data type on which the aggregate function operates. To reference -a zero-argument aggregate function, write * in place of the list of input -data types. new_nameThe new name of the aggregate function. new_ownerThe new owner of the aggregate function. new_schemaThe new schema for the aggregate function.
    Examples

    To rename the aggregate function myavg for type integer to + + + argmode + The mode of an argument: IN or VARIADIC. + If omitted, the default is IN. + + + argname + The name of an argument. Note that ALTER AGGREGATE does not + actually pay any attention to argument names, since only the argument data types + are needed to determine the aggregate function's identity. + + + argtype + An input data type on which the aggregate function operates. To reference + a zero-argument aggregate function, write * in place of the + list of input data types. To reference an ordered-set aggregate function, write + ORDER BY between the direct and aggregated argument + specifications. + +new_nameThe new name of the aggregate function. new_ownerThe new owner of the aggregate function. new_schemaThe new schema for the aggregate function.

    +
    + Notes +

    The recommended syntax for referencing an ordered-set aggregate is to write + ORDER BY between the direct and aggregated argument specifications, + in the same style as in + . + However, it will also work to omit ORDER BY and just run the + direct and aggregated argument specifications into a single list. In this + abbreviated form, if VARIADIC "any" was used in both the direct + and aggregated argument lists, write VARIADIC "any" only once.

    +
    +
    Examples

    To rename the aggregate function myavg for type integer to my_average:

    ALTER AGGREGATE myavg(integer) RENAME TO my_average;

    To change the owner of the aggregate function myavg for type integer to joe:

    ALTER AGGREGATE myavg(integer) OWNER TO joe;

    To move the aggregate function myavg for type integer into schema myschema:

    ALTER AGGREGATE myavg(integer) SET SCHEMA myschema;
    Compatibility

    There is no ALTER AGGREGATE statement in the SQL standard.

    See Also

    , diff --git a/gpdb-doc/dita/ref_guide/sql_commands/ALTER_EXTENSION.xml b/gpdb-doc/dita/ref_guide/sql_commands/ALTER_EXTENSION.xml index 553ce53b9f..a03f90a17b 100644 --- a/gpdb-doc/dita/ref_guide/sql_commands/ALTER_EXTENSION.xml +++ b/gpdb-doc/dita/ref_guide/sql_commands/ALTER_EXTENSION.xml @@ -44,9 +44,9 @@ where member_object is: and aggregate_signature is: -* | [ argmode ] [ argname ] argtype [ , ... ] | - [ [ argmode ] [ argname ] argtype [ , ... ] ] - ORDER BY [ argmode ] [ argname ] argtype [ , ... ] +* | +[ argmode ] [ argname ] argtype [ , ... ] | +[ [ argmode ] [ argname ] argtype [ , ... ] ] ORDER BY [ argmode ] [ argname ] argtype [ , ... ]

    Description diff --git a/gpdb-doc/dita/ref_guide/sql_commands/COMMENT.xml b/gpdb-doc/dita/ref_guide/sql_commands/COMMENT.xml index 9ddfd5e210..be9dd5a2b1 100644 --- a/gpdb-doc/dita/ref_guide/sql_commands/COMMENT.xml +++ b/gpdb-doc/dita/ref_guide/sql_commands/COMMENT.xml @@ -10,7 +10,7 @@ COMMENT ON { TABLE object_name | COLUMN relation_name.column_name | - AGGREGATE agg_name (agg_type [, ...]) | + AGGREGATE agg_name (agg_signature) | CAST (source_type AS target_type) | COLLATION object_name CONSTRAINT constraint_name ON table_name | @@ -38,6 +38,10 @@ TYPE object_name | VIEW object_name } IS 'text' +

    where agg_signature is:

    +* | +[ argmode ] [ argname ] argtype [ , ... ] | +[ [ argmode ] [ argname ] argtype [ , ... ] ] ORDER BY [ argmode ] [ argname ] argtype [ , ... ]
    Description @@ -73,12 +77,6 @@ IS 'text' refer to a table, view, composite type, or foreign table.Greenplum Database does not support triggers. - - aggregate_type - An input data type on which the aggregate function operates. To reference a - zero-argument aggregate function, write * in place of the list of input - data types. - source_type The name of the source data type of the cast. @@ -89,9 +87,9 @@ IS 'text' argmode - The mode of a function argument: either IN, OUT, + The mode of a function or aggregate argument: either IN, OUT, INOUT, or VARIADIC. If omitted, the default is - IN. Note that COMMENT ON FUNCTION does not actually + IN. Note that COMMENT does not actually pay any attention to OUT arguments, since only the input arguments are needed to determine the function's identity. So it is sufficient to list the IN, INOUT, and VARIADIC arguments. @@ -99,13 +97,13 @@ IS 'text' argname - The name of a function argument. Note that COMMENT ON FUNCTION does + The name of a function or aggregate argument. Note that COMMENT ON FUNCTION does not actually pay any attention to argument names, since only the argument data types are needed to determine the function's identity. argtype - The data type(s) of the function's arguments (optionally schema-qualified), if any. + The data type of a function or aggregate argument. diff --git a/gpdb-doc/dita/ref_guide/sql_commands/CREATE_AGGREGATE.xml b/gpdb-doc/dita/ref_guide/sql_commands/CREATE_AGGREGATE.xml index cf136bb213..5c058569a7 100644 --- a/gpdb-doc/dita/ref_guide/sql_commands/CREATE_AGGREGATE.xml +++ b/gpdb-doc/dita/ref_guide/sql_commands/CREATE_AGGREGATE.xml @@ -6,7 +6,7 @@

    Defines a new aggregate function.

    SynopsisargnameCREATE AGGREGATE name ( [ argmode ] [ ] arg_data_type [ , ... ] ) ( + >SynopsisCREATE AGGREGATE name ( [ argmode ] [ argname ] arg_data_type [ , ... ] ) ( SFUNC = statefunc, STYPE = state_data_type [ , SSPACE = state_data_size ] @@ -126,7 +126,8 @@ single-level aggregation that sends all the rows to the master and then applies only the statefunc to the rows.

    An aggregate function can provide an optional initial condition, an initial value for the - internal state value. This is specified and stored in the database as a value of type text, + internal state value. This is specified and stored in the database as a value of type + text, but it must be a valid external representation of a constant of the state value data type. If it is not supplied then the state value starts out NULL.

    If statefunc is declared STRICT, @@ -138,11 +139,11 @@ subsequent rows with all non-null input values. This is useful for implementing aggregates like max. Note that this behavior is only available when state_data_type is the same as the first - input_data_type. When these types are different, you must supply a + arg_data_type. When these types are different, you must supply a non-null initial condition or use a nonstrict transition function.

    If statefunc is not declared STRICT, then it will be called unconditionally at each input row, and must deal with NULL inputs - and NULL transition values for itself. This allows the aggregate author to + and NULL state values for itself. This allows the aggregate author to have full control over the aggregate's handling of NULL values.

    If the final function (ffunc) is declared STRICT, then it will not be called when the ending state value is @@ -164,7 +165,7 @@ href="https://www.postgresql.org/docs/9.4/xaggr.html#XAGGR-MOVING-AGGREGATES" scope="external" format="html">Moving-Aggregate Mode in the PostgreSQL documentation. This requires specifying the msfunc, - minfunc, and + minvfunc, and mstype functions, and optionally the mspace, mfinalfunc, @@ -184,9 +185,10 @@ direct arguments are required to match, in number and data types, the aggregated argument columns. This allows the values of those direct arguments to be added to the collection of aggregate-input rows as an additional "hypothetical" row.

    -

    Single argument aggregate functions, such as min or max, can sometimes be optimized by +

    Single argument aggregate functions, such as min or + max, can sometimes be optimized by looking into an index instead of scanning every input row. If this aggregate can be so - optimized, indicate it by specifying a sort operator. The basic requirement is that the + optimized, indicate it by specifying a sort operator. The basic requirement is that the aggregate must yield the first element in the sort ordering induced by the operator; in other words:

    SELECT agg(col) FROM tab; @@ -200,7 +202,7 @@ the specified operator is the "less than" or "greater than" strategy member of a B-tree index operator class.

    To be able to create an aggregate function, you must have USAGE privilege - on the argument types, the state type, and the return type, as well as + on the argument types, the state type(s), and the return type, as well as EXECUTE privilege on the transition and final functions.

    @@ -253,7 +255,7 @@ state_data_type - The data type for the aggregate state value. + The data type for the aggregate's state value. state_data_size @@ -279,6 +281,20 @@ to allow correct resolution of the aggregate result type when a polymorphic aggregate is being defined. + + combinefunc + The name of a combine function. This is a function of two arguments, both of type + state_data_type. It must return a value of + state_data_type. A combine function takes two transition state + values and returns a new transition state value representing the combined aggregation. + In Greenplum Database, if the result of the aggregate function is computed in a + segmented fashion, the combine function is invoked on the individual internal states in + order to combine them into an ending internal state. + Note that this function is also called in hash aggregate mode within a segment. + Therefore, if you call this aggregate function without a combine function, hash + aggregate is never chosen. Since hash aggregate is efficient, consider defining a + combine function whenever possible. + serialfunc An aggregate function whose state_data_type is @@ -359,20 +375,6 @@ no effect on run-time behavior, only on parse-time resolution of the data types and collations of the aggregate's arguments. - - combinefunc - The name of a combine function. This is a function of two arguments, both of type - state_data_type. It must return a value of - state_data_type. A combine function takes two transition state - values and returns a new transition state value representing the combined aggregation. - In Greenplum Database, if the result of the aggregate function is computed in a - segmented fashion, the combine function is invoked on the individual internal states in - order to combine them into an ending internal state. - Note that this function is also called in hash aggregate mode within a segment. - Therefore, if you call this aggregate function without a combine function, hash - aggregate is never chosen. Since hash aggregate is efficient, consider defining a - combine function whenever possible. -
    diff --git a/gpdb-doc/dita/ref_guide/sql_commands/DROP_AGGREGATE.xml b/gpdb-doc/dita/ref_guide/sql_commands/DROP_AGGREGATE.xml index 3c600dee8d..9f7666d714 100644 --- a/gpdb-doc/dita/ref_guide/sql_commands/DROP_AGGREGATE.xml +++ b/gpdb-doc/dita/ref_guide/sql_commands/DROP_AGGREGATE.xml @@ -1,15 +1,52 @@ -DROP AGGREGATE

    Removes an aggregate function.

    SynopsisDROP AGGREGATE [IF EXISTS] name ( type [, ...] ) [CASCADE | RESTRICT]
    Description

    DROP AGGREGATE will delete an existing aggregate function. +DROP AGGREGATE

    Removes an aggregate function.

    SynopsisDROP AGGREGATE [IF EXISTS] name ( aggregate_signature ) [CASCADE | RESTRICT] +

    where aggregate_signature is:

    + * | +[ argmode ] [ argname ] argtype [ , ... ] | +[ [ argmode ] [ argname ] argtype [ , ... ] ] ORDER BY [ argmode ] [ argname ] argtype [ , ... ] +
    Description

    DROP AGGREGATE will delete an existing aggregate function. To execute this command the current user must be the owner of the aggregate function.

    ParametersIF EXISTSDo not throw an error if the aggregate does not exist. A notice is -issued in this case. nameThe name (optionally schema-qualified) of an existing aggregate function. -typeAn input data type on which the aggregate function operates. To reference -a zero-argument aggregate function, write * in place -of the list of input data types. CASCADEAutomatically drop objects that depend on the aggregate function. +issued in this case. + + name + The name (optionally schema-qualified) of an existing aggregate function. + + + argmode + The mode of an argument: IN or VARIADIC. + If omitted, the default is IN. + + + argname + The name of an argument. Note that DROP AGGREGATE does not + actually pay any attention to argument names, since only the argument data types + are needed to determine the aggregate function's identity. + + + argtype + An input data type on which the aggregate function operates. To reference + a zero-argument aggregate function, write * in place of the + list of input data types. To reference an ordered-set aggregate function, write + ORDER BY between the direct and aggregated argument + specifications. + + CASCADEAutomatically drop objects that depend on the aggregate function. RESTRICTRefuse to drop the aggregate function if any objects depend on it. -This is the default.
    Examples

    To remove the aggregate function myavg for type integer:

    DROP AGGREGATE myavg(integer);
    Compatibility

    There is no DROP AGGREGATE statement in the SQL standard. +This is the default.

    +
    + Notes +

    Alternative syntaxes for referencing ordered-set aggregates are described under + ALTER AGGREGATE.

    +
    +
    Examples

    To remove the aggregate function myavg for type integer:

    DROP AGGREGATE myavg(integer); +

    To remove the hypothetical-set aggregate function myrank, + which takes an arbitrary list of ordering columns and a matching list of direct + arguments:

    + DROP AGGREGATE myrank(VARIADIC "any" ORDER BY VARIADIC "any"); +
    Compatibility

    There is no DROP AGGREGATE statement in the SQL standard.

    See Also

    ,

    diff --git a/gpdb-doc/dita/ref_guide/system_catalogs/pg_aggregate.xml b/gpdb-doc/dita/ref_guide/system_catalogs/pg_aggregate.xml index f7c0b8271f..e9b09dacb6 100644 --- a/gpdb-doc/dita/ref_guide/system_catalogs/pg_aggregate.xml +++ b/gpdb-doc/dita/ref_guide/system_catalogs/pg_aggregate.xml @@ -1,26 +1,177 @@ -pg_aggregate

    The pg_aggregate table stores information about aggregate -functions. An aggregate function is a function that operates on a set -of values (typically one column from each row that matches a query condition) -and returns a single value computed from all these values. Typical aggregate -functions are sum, count, and max. -Each entry in pg_aggregate is an extension of an entry -in pg_proc. The pg_proc entry carries -the aggregate's name, input and output data types, and other information -that is similar to ordinary functions.

    pg_catalog.pg_aggregatecolumntypereferencesdescription -aggfnoidregprocpg_proc.oidAggregate function OID -aggtransfnregprocpg_proc.oidTransition function OID -aggcombinefnregprocCombine function OID (zero if none) -aggfinalfnregprocpg_proc.oidFinal function OID (zero if none) -agginitvaltextThe initial value of the transition state. This -is a text field containing the initial value in its external string representation. -If this field is NULL, the transition state value starts out NULL -aggorderedBooleanIf true, the aggregate is defined -as ORDERED. -aggsortopoidpg_operator.oidAssociated sort operator OID (zero if none) -aggtranstypeoidpg_type.oidData type of the aggregate function's internal -transition (state) data -aggtransspaceint4pg_type.int4Estimated size of state data (0 for default estimate) -
    + + pg_aggregate + +

    The pg_aggregate table stores information about aggregate functions. An + aggregate function is a function that operates on a set of values (typically one column from + each row that matches a query condition) and returns a single value computed from all these + values. Typical aggregate functions are sum, count, and + max. Each entry in pg_aggregate is an extension of an + entry in pg_proc. The pg_proc entry carries the aggregate's + name, input and output data types, and other information that is similar to ordinary + functions.

    + + pg_catalog.pg_aggregate + + + + + + + + column + type + references + description + + + + + aggfnoid + regproc + pg_proc.oid + OID of the aggregate function + + + aggkind + char + + Aggregate kind: n for normal + aggregates, o for ordered-set aggregates, or + h for hypothetical-set aggregates + + + aggnumdirectargs + int2 + + Number of direct (non-aggregated) arguments of an + ordered-set or hypothetical-set aggregate, counting a variadic array as + one argument. If equal to pronargs, the aggregate must be + variadic and the variadic array describes the aggregated arguments as well + as the final direct arguments. Always zero for normal aggregates. + + + aggtransfn + regproc + pg_proc.oid + Transition function OID + + + aggfinalfn + regproc + pg_proc.oid + Final function OID (zero if none) + + + aggcombinefn + regproc + pg_proc.oid + Combine function OID (zero if none) + + + aggserialfn + regproc + pg_proc.oid + OID of the serialization function to convert + transtype to bytea (zero if none) + + + aggdeserialfn + regproc + pg_proc.oid + OID of the deserialization function to convert + bytea to transtype (zero if none) + + + aggmtransfn + regproc + pg_proc.oid + Forward transition function OID for moving-aggregate mode + (zero if none) + + + aggminvtransfn + regproc + pg_proc.oid + Inverse transition function OID for moving-aggregate mode + (zero if none) + + + aggmfinalfn + regproc + pg_proc.oid + Final function OID for moving-aggregate mode (zero if + none) + + + aggfinalextra + bool + + True to pass extra dummy arguments to + aggfinalfn + + + aggmfinalextra + bool + + True to pass extra dummy arguments to + aggmfinalfn + + + aggsortop + oid + pg_operator.oid + Associated sort operator OID (zero if none) + + + aggtranstype + oid + pg_type.oid + Data type of the aggregate function's internal transition (state) + data + + + aggtransspace + int4 + + Approximate average size (in bytes) of the transition state + data, or zero to use a default estimate + + + aggmtranstype + oid + pg_type.oid + Data type of the aggregate function's internal transition (state) + data for moving-aggregate mode (zero if none) + + + aggmtransspace + int4 + + Approximate average size (in bytes) of the transition state + data for moving-aggregate mode, or zero to use a default estimate + + + agginitval + text + + The initial value of the transition state. This is a text field + containing the initial value in its external string representation. If this field is + NULL, the transition state value starts out NULL. + + + aggminitval + text + + The initial value of the transition state for moving- + aggregate mode. This is a text field containing the initial value in its + external string representation. If this field is NULL, the transition state + value starts out NULL. + + + +
    + +
    -- GitLab