由 Omer Arap 提交于 8月 28, 2017

GPORCA should not spend time extracting column statistics that are not
needed for cardinality estimation. This commit eliminates this overhead
of requesting and generating the statistics for columns that are not
used in cardinality estimation unnecessarily.

E.g:
`CREATE TABLE foo (a int, b int, c int);`

For table foo, the query below only needs for stats for column `a` which
is the distribution column and column `c` which is the column used in
where clause.
`select * from foo where c=2;`

However, prior to that commit, the column statistics for column `b` is
also calculated and passed for the cardinality estimation. The only
information needed by the optimizer is the `width` of column `b`. For
this tiny information, we transfer every stats information for that
column.

This commit and its counterpart commit in GPDB ensures that the column
width information is passed and extracted in the `dxl:Relation` metadata
information.

Preliminary results for short running queries provides up to 65x
performance improvement.
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

05a26924

CMDRelationExternalGPDB.cpp 19.2 KB

Greenplum / Gpdb

Replace CMDRelationExternalGPDB.cpp