Considerations when Using GPORCA
To execute queries optimally with GPORCA, query criteria to consider.
Ensure the following criteria are met:
- The table does not contain multi-column partition keys.
- The multi-level partitioned table is a uniform multi-level partitioned table. See .
- The server configuration parameter optimizer_enable_master_only_queries
is set to on when running against master only tables such as the system
table pg_attribute. For information about the parameter, see the Greenplum
Database Reference Guide.Enabling this parameter decreases performance of
short running catalog queries. To avoid this issue, set this parameter only for a
session or a query.
- Statistics have been collected on the root partition of a partitioned table.
If the partitioned table contains more than 20,000 partitions, consider a redesign of the
table schema.
These server configuration parameters affect GPORCA query processing.
- optimizer_cte_inlining_bound controls the amount of inlining performed
for common table expression (CTE) queries (queries that contain a WHERE
clause).
- optimizer_force_multistage_agg forces GPORCA to choose a 3 stage
aggregate plan for a scalar distinct qualified aggregate.
- optimizer_force_three_stage_scalar_dqa forces GPORCA to choose a plan
with multistage aggregates when such a plan alternative is generated.
- optimizer_join_order_threshold specifies the maximum number of join
children for which GPORCA uses the dynamic programming-based join ordering algorithm.
- optimizer_nestloop_factor controls nested loop join cost factor to apply
to during query optimization.
- optimizer_parallel_union controls the amount of parallelization that
occurs for queries that contain a UNION or UNION ALL
clause. When the value is on, GPORCA can generate a query plan the child
operations of a UNION or UNION ALL operation execute in
parallel on segment instances.
- optimizer_sort_factor controls the cost factor that GPORCA applies to
sorting operations during query optimization. The cost factor can be adjusted for queries
when data skew is present.
These server configuration parameters control the display and logging of information.
- optimizer_print_missing_stats controls the display of column information
about columns with missing statistics for a query (default is true)
- optimizer_print_optimization_stats controls the logging of GPORCA query
optimization metrics for a query (default is off)
For information about the parameters, see the Greenplum Database Reference
Guide.
GPORCA generates minidumps to describe the optimization context for a given query. The minidump files are used by Pivotal support to analyze Greenplum
Database issues. The information in the file is not in a format that can be easily used
by customers for debugging or troubleshooting. The minidump file is located under the master
data directory and uses the following naming format:
Minidump_date_time.mdp
For information about the minidump file, see the server configuration parameter
optimizer_minidump in the Greenplum Database Reference
Guide.
When the EXPLAIN ANALYZE command uses GPORCA, the EXPLAIN
plan shows only the number of partitions that are being eliminated. The scanned partitions are
not shown. To show the name of the scanned partitions in the segment logs set the server
configuration parameter gp_log_dynamic_partition_pruning to
on. This example SET command enables the parameter.
SET gp_log_dynamic_partition_pruning = on;