Query Profiling Examine the query plans of poorly performing queries to identify possible performance tuning opportunities.

Greenplum Database devises a query plan for each query. Choosing the right query plan to match the query and data structure is necessary for good performance. A query plan defines how Greenplum Database will run the query in the parallel execution environment.

The query optimizer uses data statistics maintained by the database to choose a query plan with the lowest possible cost. Cost is measured in disk I/O, shown as units of disk page fetches. The goal is to minimize the total execution cost for the plan.

View the plan for a given query with the EXPLAIN command. EXPLAIN shows the query optimizer's estimated cost for the query plan. For example:

EXPLAIN SELECT * FROM names WHERE id=22;

EXPLAIN ANALYZE runs the statement in addition to displaying its plan. This is useful for determining how close the optimizer's estimates are to reality. For example:

EXPLAIN ANALYZE SELECT * FROM names WHERE id=22;

In Greenplum Database, the default GPORCA optimizer co-exists with the Postgres query optimizer. The EXPLAIN output generated by GPORCA is different than the output generated by the Postgres query optimizer.

By default, Greenplum Database uses GPORCA to generate an execution plan for a query when possible.

When the EXPLAIN ANALYZE command uses GPORCA, the EXPLAIN plan shows only the number of partitions that are being eliminated. The scanned partitions are not shown. To show name of the scanned partitions in the segment logs set the server configuration parameter gp_log_dynamic_partition_pruning to on. This example SET command enables the parameter.

SET gp_log_dynamic_partition_pruning = on;

For information about GPORCA, see .

Reading EXPLAIN Output

A query plan is a tree of nodes. Each node in the plan represents a single operation, such as a table scan, join, aggregation, or sort.

Read plans from the bottom to the top: each node feeds rows into the node directly above it. The bottom nodes of a plan are usually table scan operations: sequential, index, or bitmap index scans. If the query requires joins, aggregations, sorts, or other operations on the rows, there are additional nodes above the scan nodes to perform these operations. The topmost plan nodes are usually Greenplum Database motion nodes: redistribute, explicit redistribute, broadcast, or gather motions. These operations move rows between segment instances during query processing.

The output of EXPLAIN has one line for each node in the plan tree and shows the basic node type and the following execution cost estimates for that plan node:

Note the following:

EXPLAIN Example

The following example describes how to read an EXPLAIN query plan for a query:

EXPLAIN SELECT * FROM names WHERE name = 'Joelle';                      QUERY PLAN ------------------------------------------------------------ Gather Motion 2:1 (slice1) (cost=0.00..20.88 rows=1 width=13)    -> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)       Filter: name::text ~~ 'Joelle'::text

Read the plan from the bottom to the top. To start, the query optimizer sequentially scans the names table. Notice the WHERE clause is applied as a filter condition. This means the scan operation checks the condition for each row it scans and outputs only the rows that satisfy the condition.

The results of the scan operation are passed to a gather motion operation. In Greenplum Database, a gather motion is when segments send rows to the master. In this example, we have two segment instances that send to one master instance. This operation is working on slice1 of the parallel query execution plan. A query plan is divided into slices so the segments can work on portions of the query plan in parallel.

The estimated startup cost for this plan is 00.00 (no cost) and a total cost of 20.88 disk page fetches. The optimizer estimates this query will return one row.

Reading EXPLAIN ANALYZE Output

EXPLAIN ANALYZE plans and runs the statement. The EXPLAIN ANALYZE plan shows the actual execution cost along with the optimizer's estimates. This allows you to see if the optimizer's estimates are close to reality. EXPLAIN ANALYZE also shows the following:

EXPLAIN ANALYZE Examples

This example describes how to read an EXPLAIN ANALYZE query plan using the same query. The bold parts of the plan show actual timing and rows returned for each plan node, as well as memory and time statistics for the whole query.

EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle';                      QUERY PLAN ------------------------------------------------------------ Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..20.88 rows=1 width=13) Rows out: 1 rows at destination with 0.305 ms to first row, 0.537 ms to end, start offset by 0.289 ms. -> Seq Scan on names (cost=0.00..20.88 rows=1 width=13) Rows out: Avg 1 rows x 2 workers. Max 1 rows (seg0) with 0.255 ms to first row, 0.486 ms to end, start offset by 0.968 ms. Filter: name = 'Joelle'::text Slice statistics: (slice0) Executor memory: 135K bytes. (slice1) Executor memory: 151K bytes avg x 2 workers, 151K bytes max (seg0). Statement statistics: Memory used: 128000K bytes Total runtime: 22.548 ms

Read the plan from the bottom to the top. The total elapsed time to run this query was 22.548 milliseconds.

The sequential scan operation had only one segment (seg0) that returned rows, and it returned just 1 row. It took 0.255 milliseconds to find the first row and 0.486 to scan all rows. This result is close to the optimizer's estimate: the query optimizer estimated it would return one row for this query. The gather motion (segments sending data to the master) received 1 row . The total elapsed time for this operation was 0.537 milliseconds.

Determining the Query Optimizer

You can view EXPLAIN output to determine if GPORCA is enabled for the query plan and whether GPORCA or the Postgres query optimizer generated the explain plan. The information appears at the end of the EXPLAIN output. The Settings line displays the setting of the server configuration parameter OPTIMIZER. The Optimizer status line displays whether GPORCA or the Postgres query optimizer generated the explain plan.

For these two example query plans, GPORCA is enabled, the server configuration parameter OPTIMIZER is on. For the first plan, GPORCA generated the EXPLAIN plan. For the second plan, Greenplum Database fell back to the Postgres query optimizer to generate the query plan.

QUERY PLAN ------------------------------------------------------------------------------------ Aggregate (cost=0.00..296.14 rows=1 width=8) -> Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..295.10 rows=1 width=8) -> Aggregate (cost=0.00..294.10 rows=1 width=8) -> Seq Scan on part (cost=0.00..97.69 rows=100040 width=1) Settings: optimizer=on Optimizer status: Pivotal Optimizer (GPORCA) version 1.584 (5 rows) explain select count(*) from part; QUERY PLAN ---------------------------------------------------------------------------------------- Aggregate (cost=3519.05..3519.06 rows=1 width=8) -> Gather Motion 2:1 (slice1; segments: 2) (cost=3518.99..3519.03 rows=1 width=8) -> Aggregate (cost=3518.99..3519.00 rows=1 width=8) -> Seq Scan on part (cost=0.00..3018.79 rows=100040 width=1) Settings: optimizer=on Optimizer status: Postgres query optimizer (5 rows)

For this query, the server configuration parameter OPTIMIZER is off.explain select count(*) from part; QUERY PLAN ---------------------------------------------------------------------------------------- Aggregate (cost=3519.05..3519.06 rows=1 width=8) -> Gather Motion 2:1 (slice1; segments: 2) (cost=3518.99..3519.03 rows=1 width=8) -> Aggregate (cost=3518.99..3519.00 rows=1 width=8) -> Seq Scan on part (cost=0.00..3018.79 rows=100040 width=1) Settings: optimizer=off Optimizer status: Postgres query optimizer (5 rows)

Examining Query Plans to Solve Problems

If a query performs poorly, examine its query plan and ask the following questions: