未验证 提交 9363718d 编写于 作者: H Hans Zeller 提交者: GitHub

Experimental cost model update (port from 6X) (#11115)

This is a cherry-pick of the change from PR https://github.com/greenplum-db/gporca/pull/607

Avoid costing change for IN predicates on btree indexes

Commit e5f1716 changed the way we handle IN predicates on indexes, it
now uses a more efficient array comparison instead of treating it like
an OR predicate. A side effect is that the cost function,
CCostModelGPDB::CostBitmapTableScan, now goes through a different code
path, using the "small NDV" or "large NDV" costing method. This produces
very high cost estimates when the NDV increases beyond 2, so we
basically never choose an index for these cases, although a btree
index used in a bitmap scan isn't very sensitive to the NDV.

To avoid this, we go back to the old formula we used before commit e5f1716.
The fix is restricted to IN predicates on btree indexes, used in a bitmap
scan.

Add an MDP for a larger IN list, using a btree index on an AO table

Misc. changes to the calibration test program

- Added tests for btree indexes (btree_scan_tests).
- Changed data distribution so that all column values range from 1...n.
- Parameter values for test queries are now proportional to selectivity,
  a parameter value of 0 produces a selectivity of 0%.
- Changed the logic to fake statistics somewhat, hopefully this will
  lead to more precise estimates. Incorporated the changes to the
  data distribution with no more 0 values. Added fake stats for
  unique columns.
- Headers of tests now use semicolons to separate parts, to give
  a nicer output when pasting into Google Docs.
- Some formatting changes.
- Log fallbacks.
- When using existing tables, the program now determines the table
  structure (heap or append-only) and the row count.
- Split off two very slow tests into separate test units. These are
  not included when running "all" tests, they have to be run
  explicitly.
- Add btree join tests, rename "bitmap_join_tests" to "index_join_tests"
  and run both bitmap and btree joins
- Update min and max parameter values to cover a range that includes
  or at least is closer to the cross-over between index and table scan
- Remove the "high NDV" tests, since the ranges in the general test
  now include both low and high NDV cases (<= and > 200)
- Print out selectivity of each query, if available
- Suppress standard deviation output when we execute queries only once
- Set search path when connecting
- Decrease the parameter range when running bitmap scan tests on
  heap tables
- Run btree scan tests only on AO tables, they are not designed
  for testing index scans

Updates to the experimental cost model, new calibration

1. Simplify some of the formulas, the calibration process seemed to justify
   that. We might have to revisit if problems come up. Changes:
   - Rewrite some of the formulas so the costs per row and costs per byte
     are more easy to see
   - Make the cost for the width directly proportional
   - Unify the formula for scans and joins, use the same per-byte costs
     and make NDV-dependent costs proportional to num_rebinds * dNDV,
     except for the logic in item 3.

   That makes the cost for the new experimental cost model a simple linear formula:

   num_rebinds * ( rows * c1 + rows * width * c2 + ndv * c3 + bitmap_union_cost + c4 ) + c5

   We have 5 constants, c1 ... c5:

   c1: cost per row (rows on one segment)
   c2: cost per byte
   c3: cost per distinct value (total NDV on all segments)
   c4: cost per rebind
   c5: initialization cost
   bitmap_union_cost: see item 3 below

2. Recalibrate some of the cost parameters, using the updated calibration
   program src/backend/gporca/scripts/cal_bitmap_test.py

3. Add a cost penalty for bitmap index scans on heap tables. The added
   cost takes the form bitmap_union_cost = <base table rows> * (NDV-1) * c6.

   The reason for this is, as others have pointed out, that heap tables
   lead to much larger bit vectors, since their CTIDs are more spaced out
   than those of AO tables. The main factor seems to be the cost of unioning
   these bit vectors, and that cost is proportional to the number of bitmaps
   minus one and the size of the bitmaps, which is approximated here by the
   number of rows in the table.

   Note that because we use (NDV-1) in the formula, this penalty does not
   apply to usual index joins, which have an NDV of 1 per rebind. This is
   consistent with what we see in measurements and it also seems reasonable,
   since we don't have to union bitmaps in this case.

4. Fix to select CostModelGPDB for the 'experimental' model, as we do in 5X.

5. Calibrate the constants involved (c1 ... c6), using the calibration program
   and running experiments with heap and append-only tables on a laptop and
   also on a Linux cluster with 24 segments. Also run some other workloads
   for validation.

6. Give a small initial advantage to bitmap scans, so they will be chosen over
   table scans for small tables. Otherwise, small queries will
   have more or less random plans, all of which cost around 431, the value
   of the initial cost. Added a 10% advantage of the bitmap scan.

* Port calibration program to Python 3

- Used 2to3 program to do the basics.
- Version parameter in argparse no longer supported
- Needs additional option in connection string to keep the search path
- The dbconn.execSQL call can no longer be used to get a cursor,
  this was probably a non-observable defect in the Python 2 version
- Needed to use // (floor division) in some cases
Co-authored-by: NDavid Kimura <dkimura@vmware.com>
上级 bfcc63e1
......@@ -477,7 +477,7 @@ ICostModel *
COptTasks::GetCostModel(CMemoryPool *mp, ULONG num_segments)
{
ICostModel *cost_model = NULL;
if (OPTIMIZER_GPDB_CALIBRATED >= optimizer_cost_model)
if (optimizer_cost_model >= OPTIMIZER_GPDB_CALIBRATED)
{
cost_model = GPOS_NEW(mp) CCostModelGPDB(mp, num_segments);
}
......
<?xml version="1.0" encoding="UTF-8"?>
<dxl:DXLMessage xmlns:dxl="http://greenplum.com/dxl/2010/12/">
<dxl:Comment><![CDATA[
Test case: B-tree index and a somewhat larger IN list predicate
DROP TABLE IF EXISTS test;
CREATE TABLE test (a int) with (appendonly= true) distributed randomly;
INSERT INTO test SELECT * FROM generate_series(1,1000000);
CREATE INDEX test_index ON test(a);
set optimizer_enumerate_plans to on;
SELECT * FROM test WHERE a in (2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40);
Expect a plan with a bitmap index scan
]]>
</dxl:Comment>
<dxl:Thread Id="0">
<dxl:OptimizerConfig>
<dxl:EnumeratorConfig Id="0" PlanSamples="0" CostThreshold="0"/>
<dxl:StatisticsConfig DampingFactorFilter="0.750000" DampingFactorJoin="0.010000" DampingFactorGroupBy="0.750000" MaxStatsBuckets="100"/>
<dxl:CTEConfig CTEInliningCutoff="0"/>
<dxl:WindowOids RowNumber="7000" Rank="7001"/>
<dxl:CostModelConfig CostModelType="1" SegmentsForCosting="3">
<dxl:CostParams>
<dxl:CostParam Name="NLJFactor" Value="1024.000000" LowerBound="1023.500000" UpperBound="1024.500000"/>
</dxl:CostParams>
</dxl:CostModelConfig>
<dxl:Hint MinNumOfPartsToRequireSortOnInsert="2147483647" JoinArityForAssociativityCommutativity="18" ArrayExpansionThreshold="100" JoinOrderDynamicProgThreshold="10" BroadcastThreshold="100000" EnforceConstraintsOnDML="false" PushGroupByBelowSetopThreshold="10"/>
<dxl:TraceFlags Value="101013,102074,102113,102120,102146,102147,103001,103014,103015,103022,103027,103029,103037,104003,104004,104005,104006,104007,105000"/>
</dxl:OptimizerConfig>
<dxl:Metadata SystemIds="0.GPDB">
<dxl:Type Mdid="0.16.1.0" Name="bool" IsRedistributable="true" IsHashable="true" IsMergeJoinable="true" IsComposite="false" IsTextRelated="false" IsFixedLength="true" Length="1" PassByValue="true">
<dxl:EqualityOp Mdid="0.91.1.0"/>
<dxl:InequalityOp Mdid="0.85.1.0"/>
<dxl:LessThanOp Mdid="0.58.1.0"/>
<dxl:LessThanEqualsOp Mdid="0.1694.1.0"/>
<dxl:GreaterThanOp Mdid="0.59.1.0"/>
<dxl:GreaterThanEqualsOp Mdid="0.1695.1.0"/>
<dxl:ComparisonOp Mdid="0.1693.1.0"/>
<dxl:ArrayType Mdid="0.1000.1.0"/>
<dxl:MinAgg Mdid="0.0.0.0"/>
<dxl:MaxAgg Mdid="0.0.0.0"/>
<dxl:AvgAgg Mdid="0.0.0.0"/>
<dxl:SumAgg Mdid="0.0.0.0"/>
<dxl:CountAgg Mdid="0.2147.1.0"/>
</dxl:Type>
<dxl:Type Mdid="0.23.1.0" Name="int4" IsRedistributable="true" IsHashable="true" IsMergeJoinable="true" IsComposite="false" IsTextRelated="false" IsFixedLength="true" Length="4" PassByValue="true">
<dxl:EqualityOp Mdid="0.96.1.0"/>
<dxl:InequalityOp Mdid="0.518.1.0"/>
<dxl:LessThanOp Mdid="0.97.1.0"/>
<dxl:LessThanEqualsOp Mdid="0.523.1.0"/>
<dxl:GreaterThanOp Mdid="0.521.1.0"/>
<dxl:GreaterThanEqualsOp Mdid="0.525.1.0"/>
<dxl:ComparisonOp Mdid="0.351.1.0"/>
<dxl:ArrayType Mdid="0.1007.1.0"/>
<dxl:MinAgg Mdid="0.2132.1.0"/>
<dxl:MaxAgg Mdid="0.2116.1.0"/>
<dxl:AvgAgg Mdid="0.2101.1.0"/>
<dxl:SumAgg Mdid="0.2108.1.0"/>
<dxl:CountAgg Mdid="0.2147.1.0"/>
</dxl:Type>
<dxl:Type Mdid="0.26.1.0" Name="oid" IsRedistributable="true" IsHashable="true" IsMergeJoinable="true" IsComposite="false" IsTextRelated="false" IsFixedLength="true" Length="4" PassByValue="true">
<dxl:EqualityOp Mdid="0.607.1.0"/>
<dxl:InequalityOp Mdid="0.608.1.0"/>
<dxl:LessThanOp Mdid="0.609.1.0"/>
<dxl:LessThanEqualsOp Mdid="0.611.1.0"/>
<dxl:GreaterThanOp Mdid="0.610.1.0"/>
<dxl:GreaterThanEqualsOp Mdid="0.612.1.0"/>
<dxl:ComparisonOp Mdid="0.356.1.0"/>
<dxl:ArrayType Mdid="0.1028.1.0"/>
<dxl:MinAgg Mdid="0.2118.1.0"/>
<dxl:MaxAgg Mdid="0.2134.1.0"/>
<dxl:AvgAgg Mdid="0.0.0.0"/>
<dxl:SumAgg Mdid="0.0.0.0"/>
<dxl:CountAgg Mdid="0.2147.1.0"/>
</dxl:Type>
<dxl:Type Mdid="0.27.1.0" Name="tid" IsRedistributable="true" IsHashable="false" IsMergeJoinable="true" IsComposite="false" IsTextRelated="false" IsFixedLength="true" Length="6" PassByValue="false">
<dxl:EqualityOp Mdid="0.387.1.0"/>
<dxl:InequalityOp Mdid="0.402.1.0"/>
<dxl:LessThanOp Mdid="0.2799.1.0"/>
<dxl:LessThanEqualsOp Mdid="0.2801.1.0"/>
<dxl:GreaterThanOp Mdid="0.2800.1.0"/>
<dxl:GreaterThanEqualsOp Mdid="0.2802.1.0"/>
<dxl:ComparisonOp Mdid="0.2794.1.0"/>
<dxl:ArrayType Mdid="0.1010.1.0"/>
<dxl:MinAgg Mdid="0.2798.1.0"/>
<dxl:MaxAgg Mdid="0.2797.1.0"/>
<dxl:AvgAgg Mdid="0.0.0.0"/>
<dxl:SumAgg Mdid="0.0.0.0"/>
<dxl:CountAgg Mdid="0.2147.1.0"/>
</dxl:Type>
<dxl:RelationStatistics Mdid="2.65590.1.0" Name="test" Rows="1000000.000000" EmptyRelation="false"/>
<dxl:Relation Mdid="0.65590.1.0" Name="test" IsTemporary="false" HasOids="false" StorageType="AppendOnly, Row-oriented" DistributionPolicy="Random" Keys="3,1" NumberLeafPartitions="0">
<dxl:Columns>
<dxl:Column Name="a" Attno="1" Mdid="0.23.1.0" Nullable="true" ColWidth="4">
<dxl:DefaultValue/>
</dxl:Column>
<dxl:Column Name="ctid" Attno="-1" Mdid="0.27.1.0" Nullable="false" ColWidth="6">
<dxl:DefaultValue/>
</dxl:Column>
<dxl:Column Name="tableoid" Attno="-7" Mdid="0.26.1.0" Nullable="false" ColWidth="4">
<dxl:DefaultValue/>
</dxl:Column>
<dxl:Column Name="gp_segment_id" Attno="-8" Mdid="0.23.1.0" Nullable="false" ColWidth="4">
<dxl:DefaultValue/>
</dxl:Column>
</dxl:Columns>
<dxl:IndexInfoList>
<dxl:IndexInfo Mdid="0.65597.1.0" IsPartial="false"/>
</dxl:IndexInfoList>
<dxl:Triggers/>
<dxl:CheckConstraints/>
</dxl:Relation>
<dxl:ColumnStatistics Mdid="1.65590.1.0.0" Name="a" Width="4.000000" NullFreq="0.000000" NdvRemain="0.000000" FreqRemain="0.000000" ColStatsMissing="false">
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="17"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="10306"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="10306"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="20471"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="20471"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="30741"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="30741"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="40319"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="40319"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="49298"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="49298"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="59586"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="59586"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="68863"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="68863"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="78937"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="78937"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="88304"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="88304"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="98153"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="98153"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="106833"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="106833"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="117028"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="117028"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="126437"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="126437"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="135757"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="135757"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="146159"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="146159"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="156462"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="156462"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="166921"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="166921"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="176662"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="176662"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="187330"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="187330"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="197564"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="197564"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="207628"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="207628"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="217096"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="217096"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="226598"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="226598"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="237177"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="237177"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="247200"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="247200"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="256851"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="256851"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="266527"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="266527"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="275749"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="275749"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="285119"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="285119"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="294314"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="294314"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="304518"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="304518"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="313736"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="313736"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="323207"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="323207"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="332637"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="332637"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="342615"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="342615"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="352900"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="352900"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="362533"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="362533"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="373275"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="373275"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="382645"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="382645"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="393599"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="393599"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="403209"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="403209"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="413266"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="413266"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="423082"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="423082"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="433489"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="433489"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="443422"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="443422"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="454175"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="454175"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="463851"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="463851"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="474315"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="474315"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="484033"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="484033"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="494224"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="494224"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="503881"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="503881"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="514428"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="514428"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="524021"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="524021"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="534314"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="534314"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="543310"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="543310"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="553269"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="553269"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="562885"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="562885"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="573058"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="573058"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="581999"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="581999"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="592706"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="592706"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="603299"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="603299"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="612571"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="612571"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="623179"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="623179"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="632707"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="632707"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="643068"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="643068"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="652712"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="652712"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="662536"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="662536"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="671555"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="671555"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="682479"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="682479"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="692433"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="692433"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="702219"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="702219"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="711840"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="711840"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="721237"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="721237"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="730068"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="730068"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="739720"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="739720"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="750644"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="750644"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="760000"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="760000"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="769992"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="769992"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="780388"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="780388"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="789522"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="789522"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="799950"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="799950"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="809673"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="809673"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="819813"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="819813"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="830542"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="830542"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="840877"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="840877"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="850205"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="850205"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="860368"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="860368"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="869389"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="869389"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="879235"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="879235"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="889190"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="889190"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="899305"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="899305"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="909272"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="909272"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="919445"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="919445"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="929744"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="929744"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="939929"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="939929"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="949719"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="949719"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="959390"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="959390"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="969304"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="969304"/>
<dxl:UpperBound Closed="false" TypeMdid="0.23.1.0" Value="979160"/>
</dxl:StatsBucket>
<dxl:StatsBucket Frequency="0.010000" DistinctValues="10000.000000">
<dxl:LowerBound Closed="true" TypeMdid="0.23.1.0" Value="979160"/>
<dxl:UpperBound Closed="true" TypeMdid="0.23.1.0" Value="998114"/>
</dxl:StatsBucket>
</dxl:ColumnStatistics>
<dxl:Index Mdid="0.65597.1.0" Name="test_index" IsClustered="false" IndexType="B-tree" IndexItemType="0.2283.1.0" KeyColumns="0" IncludedColumns="0,1,2,3">
<dxl:Opfamilies>
<dxl:Opfamily Mdid="0.1976.1.0"/>
</dxl:Opfamilies>
</dxl:Index>
<dxl:GPDBScalarOp Mdid="0.96.1.0" Name="=" ComparisonType="Eq" ReturnsNullOnNullInput="true" IsNDVPreserving="false">
<dxl:LeftType Mdid="0.23.1.0"/>
<dxl:RightType Mdid="0.23.1.0"/>
<dxl:ResultType Mdid="0.16.1.0"/>
<dxl:OpFunc Mdid="0.65.1.0"/>
<dxl:Commutator Mdid="0.96.1.0"/>
<dxl:InverseOp Mdid="0.518.1.0"/>
<dxl:Opfamilies>
<dxl:Opfamily Mdid="0.1976.1.0"/>
<dxl:Opfamily Mdid="0.1977.1.0"/>
<dxl:Opfamily Mdid="0.3027.1.0"/>
</dxl:Opfamilies>
</dxl:GPDBScalarOp>
</dxl:Metadata>
<dxl:Query>
<dxl:OutputColumns>
<dxl:Ident ColId="1" ColName="a" TypeMdid="0.23.1.0"/>
</dxl:OutputColumns>
<dxl:CTEList/>
<dxl:LogicalSelect>
<dxl:ArrayComp OperatorName="=" OperatorMdid="0.96.1.0" OperatorType="Any">
<dxl:Ident ColId="1" ColName="a" TypeMdid="0.23.1.0"/>
<dxl:Array ArrayType="0.1007.1.0" ElementType="0.23.1.0" MultiDimensional="false">
<dxl:ConstValue TypeMdid="0.23.1.0" Value="2"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="4"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="6"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="8"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="10"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="12"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="14"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="16"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="18"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="20"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="22"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="24"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="26"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="28"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="30"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="32"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="34"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="36"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="38"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="40"/>
</dxl:Array>
</dxl:ArrayComp>
<dxl:LogicalGet>
<dxl:TableDescriptor Mdid="0.65590.1.0" TableName="test">
<dxl:Columns>
<dxl:Column ColId="1" Attno="1" ColName="a" TypeMdid="0.23.1.0" ColWidth="4"/>
<dxl:Column ColId="2" Attno="-1" ColName="ctid" TypeMdid="0.27.1.0" ColWidth="6"/>
<dxl:Column ColId="3" Attno="-7" ColName="tableoid" TypeMdid="0.26.1.0" ColWidth="4"/>
<dxl:Column ColId="4" Attno="-8" ColName="gp_segment_id" TypeMdid="0.23.1.0" ColWidth="4"/>
</dxl:Columns>
</dxl:TableDescriptor>
</dxl:LogicalGet>
</dxl:LogicalSelect>
</dxl:Query>
<dxl:Plan Id="0" SpaceSize="2">
<dxl:GatherMotion InputSegments="0,1,2" OutputSegments="-1">
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="387.988391" Rows="13.000000" Width="4"/>
</dxl:Properties>
<dxl:ProjList>
<dxl:ProjElem ColId="0" Alias="a">
<dxl:Ident ColId="0" ColName="a" TypeMdid="0.23.1.0"/>
</dxl:ProjElem>
</dxl:ProjList>
<dxl:Filter/>
<dxl:SortingColumnList/>
<dxl:BitmapTableScan>
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="387.988165" Rows="13.000000" Width="4"/>
</dxl:Properties>
<dxl:ProjList>
<dxl:ProjElem ColId="0" Alias="a">
<dxl:Ident ColId="0" ColName="a" TypeMdid="0.23.1.0"/>
</dxl:ProjElem>
</dxl:ProjList>
<dxl:Filter/>
<dxl:RecheckCond>
<dxl:ArrayComp OperatorName="=" OperatorMdid="0.96.1.0" OperatorType="Any">
<dxl:Ident ColId="0" ColName="a" TypeMdid="0.23.1.0"/>
<dxl:Array ArrayType="0.1007.1.0" ElementType="0.23.1.0" MultiDimensional="false">
<dxl:ConstValue TypeMdid="0.23.1.0" Value="2"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="4"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="6"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="8"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="10"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="12"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="14"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="16"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="18"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="20"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="22"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="24"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="26"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="28"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="30"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="32"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="34"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="36"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="38"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="40"/>
</dxl:Array>
</dxl:ArrayComp>
</dxl:RecheckCond>
<dxl:BitmapIndexProbe>
<dxl:IndexCondList>
<dxl:ArrayComp OperatorName="=" OperatorMdid="0.96.1.0" OperatorType="Any">
<dxl:Ident ColId="0" ColName="a" TypeMdid="0.23.1.0"/>
<dxl:Array ArrayType="0.1007.1.0" ElementType="0.23.1.0" MultiDimensional="false">
<dxl:ConstValue TypeMdid="0.23.1.0" Value="2"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="4"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="6"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="8"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="10"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="12"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="14"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="16"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="18"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="20"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="22"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="24"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="26"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="28"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="30"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="32"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="34"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="36"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="38"/>
<dxl:ConstValue TypeMdid="0.23.1.0" Value="40"/>
</dxl:Array>
</dxl:ArrayComp>
</dxl:IndexCondList>
<dxl:IndexDescriptor Mdid="0.65597.1.0" IndexName="test_index"/>
</dxl:BitmapIndexProbe>
<dxl:TableDescriptor Mdid="0.65590.1.0" TableName="test">
<dxl:Columns>
<dxl:Column ColId="0" Attno="1" ColName="a" TypeMdid="0.23.1.0" ColWidth="4"/>
<dxl:Column ColId="1" Attno="-1" ColName="ctid" TypeMdid="0.27.1.0" ColWidth="6"/>
<dxl:Column ColId="2" Attno="-7" ColName="tableoid" TypeMdid="0.26.1.0" ColWidth="4"/>
<dxl:Column ColId="3" Attno="-8" ColName="gp_segment_id" TypeMdid="0.23.1.0" ColWidth="4"/>
</dxl:Columns>
</dxl:TableDescriptor>
</dxl:BitmapTableScan>
</dxl:GatherMotion>
</dxl:Plan>
</dxl:Thread>
</dxl:DXLMessage>
......@@ -608,7 +608,7 @@
<dxl:Plan Id="0" SpaceSize="2">
<dxl:GatherMotion InputSegments="0,1,2" OutputSegments="-1">
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="431.757244" Rows="983.264660" Width="4"/>
<dxl:Cost StartupCost="0" TotalCost="400.107345" Rows="983.264660" Width="4"/>
</dxl:Properties>
<dxl:ProjList>
<dxl:ProjElem ColId="0" Alias="a">
......@@ -619,7 +619,7 @@
<dxl:SortingColumnList/>
<dxl:BitmapTableScan>
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="431.740148" Rows="983.264660" Width="4"/>
<dxl:Cost StartupCost="0" TotalCost="400.090249" Rows="983.264660" Width="4"/>
</dxl:Properties>
<dxl:ProjList>
<dxl:ProjElem ColId="0" Alias="a">
......
......@@ -26,6 +26,7 @@
#include "gpopt/operators/CPhysicalMotion.h"
#include "gpopt/operators/CPhysicalPartitionSelector.h"
#include "gpopt/operators/CPredicateUtils.h"
#include "gpopt/operators/CScalarBitmapIndexProbe.h"
#include "naucrates/statistics/CStatisticsUtils.h"
#include "gpopt/operators/CExpression.h"
#include "gpdbcost/CCostModelGPDB.h"
......@@ -1618,6 +1619,18 @@ CCostModelGPDB::CostBitmapTableScan(CMemoryPool *mp, CExpressionHandle &exprhdl,
CColRefSet *pcrsUsed = pexprIndexCond->DeriveUsedColumns();
CColRefSet *outerRefs = exprhdl.DeriveOuterReferences();
CColRefSet *pcrsLocalUsed = GPOS_NEW(mp) CColRefSet(mp, *pcrsUsed);
IMDIndex::EmdindexType indexType = IMDIndex::EmdindSentinel;
if (COperator::EopScalarBitmapIndexProbe == pexprIndexCond->Pop()->Eopid())
{
indexType = CScalarBitmapIndexProbe::PopConvert(pexprIndexCond->Pop())
->Pindexdesc()
->IndexType();
}
BOOL isInPredOnBtreeIndex =
(IMDIndex::EmdindBtree == indexType &&
COperator::EopScalarArrayCmp == (*pexprIndexCond)[0]->Pop()->Eopid());
// subtract outer references from the used colrefs, so we can see
// how many colrefs are used for this table
......@@ -1632,9 +1645,17 @@ CCostModelGPDB::CostBitmapTableScan(CMemoryPool *mp, CExpressionHandle &exprhdl,
if (COperator::EopScalarBitmapIndexProbe !=
pexprIndexCond->Pop()->Eopid() ||
1 < pcrsLocalUsed->Size())
1 < pcrsLocalUsed->Size() ||
(isInPredOnBtreeIndex && rows > 2.0 &&
!GPOS_FTRACE(EopttraceCalibratedBitmapIndexCostModel)))
{
// child is Bitmap AND/OR, or we use Multi column index
// Child is Bitmap AND/OR, or we use Multi column index or this is an IN predicate
// that's used with the "calibrated" cost model.
// Handling the IN predicate in this code path is to avoid plan regressions from
// earlier versions of the code that treated IN predicates like ORs and therefore
// also handled them in this code path. This is especially noticeable for btree
// indexes that often have a high NDV, because the small/large NDV cost model
// produces very high cost for cases with a higher NDV.
const CDouble dIndexFilterCostUnit =
pcmgpdb->GetCostModelParams()
->PcpLookup(CCostModelParamsGPDB::EcpIndexFilterCostUnit)
......@@ -1671,6 +1692,11 @@ CCostModelGPDB::CostBitmapTableScan(CMemoryPool *mp, CExpressionHandle &exprhdl,
// if the expression is const table get, the pcrsUsed is empty
// so we use minimum value MinDistinct for dNDV in that case.
CDouble dNDV = CHistogram::MinDistinct;
CDouble dNDVThreshold =
pcmgpdb->GetCostModelParams()
->PcpLookup(CCostModelParamsGPDB::EcpBitmapNDVThreshold)
->Get();
if (rows < 1.0)
{
// if we aren't accessing a row every rebind, then don't charge a cost for those cases where we don't have a row
......@@ -1698,10 +1724,7 @@ CCostModelGPDB::CostBitmapTableScan(CMemoryPool *mp, CExpressionHandle &exprhdl,
if (!GPOS_FTRACE(EopttraceCalibratedBitmapIndexCostModel))
{
CDouble dNDVThreshold =
pcmgpdb->GetCostModelParams()
->PcpLookup(CCostModelParamsGPDB::EcpBitmapNDVThreshold)
->Get();
// optimizer_cost_model = 'calibrated'
if (dNDVThreshold <= dNDV)
{
result = CostBitmapLargeNDV(pcmgpdb, pci, dNDV);
......@@ -1713,44 +1736,66 @@ CCostModelGPDB::CostBitmapTableScan(CMemoryPool *mp, CExpressionHandle &exprhdl,
}
else
{
// optimizer_cost_model = 'experimental'
CDouble dBitmapIO =
pcmgpdb->GetCostModelParams()
->PcpLookup(CCostModelParamsGPDB::EcpBitmapIOCostSmallNDV)
->Get();
CDouble dInitScan =
CDouble c5_dInitScan =
pcmgpdb->GetCostModelParams()
->PcpLookup(CCostModelParamsGPDB::EcpInitScanFactor)
->Get();
CDouble c3_dBitmapPageCost =
pcmgpdb->GetCostModelParams()
->PcpLookup(CCostModelParamsGPDB::EcpBitmapPageCost)
->Get();
BOOL isAOTable = CPhysicalScan::PopConvert(exprhdl.Pop())
->Ptabdesc()
->IsAORowOrColTable();
// some cost constants determined with the cal_bitmap_test.py script
CDouble c1_cost_per_row(0.03);
CDouble c2_cost_per_byte(0.0001);
CDouble bitmap_union_cost_per_distinct_value(0.000027);
CDouble init_cost_advantage_for_bitmap_scan(0.9);
if (1 < pcrsUsed->Size()) // it is a join
if (IMDIndex::EmdindBtree == indexType)
{
// The numbers below were experimentally determined using regression analysis in the cal_bitmap_test.py script
// The following dSizeCost is in the form C1 * rows + C2 * rows * width. This is because the width should have
// significantly less weight than rows as the execution time does not grow as fast in regards to width
CDouble dSizeCost =
rows * (1 + std::max(width * 0.005, 1.0)) * 0.05;
result = CCost( // cost for each byte returned by the index scan plus cost for incremental rebinds
pci->NumRebinds() * (dBitmapIO * dSizeCost + dInitRebind) +
// the BitmapPageCost * dNDV takes into account the idea of multiple tuples being on the same page.
// If you have a small NDV, the likelihood of multiple tuples matching on one page is high and so the
// page cost is reduced. Even though the page cost will decrease, the cost of accessing each tuple will
// dominate. Likewise, if the NDV is large, the num of tuples matching per page is lower so the page
// cost should be higher
dInitScan * dNDV);
// btree indexes are not sensitive to the NDV, since they don't have any bitmaps
c3_dBitmapPageCost = 0.0;
}
else
// Give the index scan a small initial advantage over the table scan, so we use indexes
// for small tables - this should avoid having table scan and index scan costs being
// very close together for many small queries.
c5_dInitScan = c5_dInitScan * init_cost_advantage_for_bitmap_scan;
// The numbers below were experimentally determined using regression analysis in the cal_bitmap_test.py script
// The following dSizeCost is in the form C1 * rows + C2 * rows * width. This is because the width should have
// significantly less weight than rows as the execution time does not grow as fast in regards to width
CDouble dSizeCost = dBitmapIO * (rows * c1_cost_per_row +
rows * width * c2_cost_per_byte);
CDouble bitmapUnionCost = 0;
if (!isAOTable && indexType == IMDIndex::EmdindBitmap && dNDV > 1.0)
{
// The numbers below were experimentally determined using regression analysis in the cal_bitmap_test.py script
CDouble dSizeCost =
rows * (1 + std::max(width * 0.005, 1.0)) * 0.001;
result =
CCost( // cost for each byte returned by the index scan plus cost for incremental rebinds
pci->NumRebinds() *
(dBitmapIO * dSizeCost + 10 * dInitRebind) * dNDV +
// similar to above, the dInitScan * dNDV takes into account the likelihood of multiple tuples per page
dInitScan * dNDV);
CDouble baseTableRows = CPhysicalScan::PopConvert(exprhdl.Pop())
->PstatsBaseTable()
->Rows();
// for bitmap index scans on heap tables, we found that there is an additional cost
// associated with unioning them that is proportional to the number of bitmaps involved
// (dNDV-1) times the width of the bitmap (proportional to the number of rows in the table)
bitmapUnionCost = std::max(0.0, dNDV.Get() - 1.0) *
baseTableRows *
bitmap_union_cost_per_distinct_value;
}
result = CCost(pci->NumRebinds() *
(dSizeCost + dNDV * c3_dBitmapPageCost +
dInitRebind + bitmapUnionCost) +
c5_dInitScan);
}
}
......
......@@ -169,7 +169,7 @@ const CDouble CCostModelParamsGPDB::DBitmapPageCostLargeNDV(83.1651);
const CDouble CCostModelParamsGPDB::DBitmapPageCostSmallNDV(204.3810);
// default bitmap page cost with no assumption about NDV
const CDouble CCostModelParamsGPDB::DBitmapPageCost(50.4381);
const CDouble CCostModelParamsGPDB::DBitmapPageCost(10);
// default threshold of NDV for bitmap costing
const CDouble CCostModelParamsGPDB::DBitmapNDVThreshold(200);
......
......@@ -76,7 +76,7 @@ using namespace gpopt;
// predicates less selective than this threshold
// (selectivity is greater than this number) lead to
// disqualification of a btree index on an AO table
#define AO_TABLE_BTREE_INDEX_SELECTIVITY_THRESHOLD 0.05
#define AO_TABLE_BTREE_INDEX_SELECTIVITY_THRESHOLD 0.10
//---------------------------------------------------------------------------
// @function:
......
#!/usr/bin/env python
#!/usr/bin/env python3
# Optimizer calibration test for bitmap indexes
#
......@@ -25,10 +25,12 @@ import argparse
import time
import re
import math
import os
import sys
try:
from gppylib.db import dbconn
except ImportError, e:
except ImportError as e:
sys.exit('ERROR: Cannot import modules. Please check that you have sourced greenplum_path.sh. Detail: ' + str(e))
# constants
......@@ -64,20 +66,27 @@ NL_JOIN = "nl_join"
NL_JOIN_PATTERN = r"Nested Loop"
NL_JOIN_PATTERN_V5 = r"Nested Loop"
OPTIMIZER_DEFAULT_PLAN = "optimizer"
FALLBACK_PLAN = "fallback"
FALLBACK_PATTERN = "Postgres query optimizer"
FALLBACK_PATTERN_V5 = "legacy query optimizer"
OPTIMIZER_DEFAULT_PLAN = "optimizer"
# global variables
# -----------------------------------------------------------------------------
glob_verbose=False
# constants
# only consider optimizer errors beyond x * sigma (standard deviation) as significant
glob_sigma_diff=3
glob_log_file=None
glob_exe_timeout=40000
glob_gpdb_major_version=7
glob_sigma_diff = 3
glob_log_file = None
glob_exe_timeout = 40000
glob_gpdb_major_version = 7
glob_dim_table_rows = 10000
# global variables that may be modified
glob_verbose = False
glob_rowcount = -1
glob_appendonly = False
# SQL statements, DDL and DML
# -----------------------------------------------------------------------------
......@@ -119,42 +128,41 @@ DISTRIBUTED BY (id);
"""
_with_appendonly = """
WITH (appendonly=true, compresslevel=5, compresstype=zlib)
WITH (appendonly=true)
"""
_create_other_tables = [ """
_create_other_tables = ["""
CREATE TABLE cal_temp_ids(f_id int, f_rand double precision) DISTRIBUTED BY (f_id);
""",
"""
"""
CREATE TABLE cal_dim(dim_id int,
dim_id2 int,
txt text)
DISTRIBUTED BY (dim_id);
""",
"""
"""
CREATE TABLE cal_bfv_dim (id integer, col2 integer) DISTRIBUTED BY (id);
""" ]
"""]
# insert into temp table. Parameters:
# - integer start value (usually 0 or 1)
# - integer stop value (suggested value is 10000000)
# - integer stop value (suggested value is 10,000,000)
_insert_into_temp = """
INSERT INTO cal_temp_ids SELECT x, random() FROM (SELECT * FROM generate_series(%d,%d)) T(x);
INSERT INTO cal_temp_ids SELECT x, random() FROM (SELECT * FROM generate_series(1,%d)) T(x);
"""
_insert_into_table = """
INSERT INTO cal_txtest
SELECT f_id,
f_id,
f_id%10,
f_id%100,
f_id%1000,
f_id%10000,
f_id%10,
f_id%100,
f_id%1000,
f_id%10000,
repeat('a', 900)
f_id%10 + 1,
f_id%100 + 1,
f_id%1000 + 1,
f_id%10000 + 1,
f_id%10 + 1,
f_id%100 + 1,
f_id%1000 + 1,
f_id%10000 + 1,
repeat('a', 960)
FROM cal_temp_ids
order by f_rand;
"""
......@@ -166,49 +174,49 @@ INSERT INTO cal_dim SELECT x, x, repeat('d', 100) FROM (SELECT * FROM generate_s
_create_index_arr = ["""
CREATE INDEX cal_txtest_i_bitmap_10 ON cal_txtest USING bitmap(bitmap10);
""",
"""
"""
CREATE INDEX cal_txtest_i_bitmap_100 ON cal_txtest USING bitmap(bitmap100);
""",
"""
"""
CREATE INDEX cal_txtest_i_bitmap_1000 ON cal_txtest USING bitmap(bitmap1000);
""",
"""
"""
CREATE INDEX cal_txtest_i_bitmap_10000 ON cal_txtest USING bitmap(bitmap10000);
""",
]
]
_create_bfv_index_arr = ["""
CREATE INDEX idx_cal_bfvtest_bitmap ON cal_bfvtest USING bitmap(id);
""",
]
]
_create_ndv_index_arr = ["""
CREATE INDEX cal_ndvtest_bitmap ON cal_ndvtest USING bitmap(val);
""",
]
]
_create_btree_indexes_ao_arr = ["""
_create_btree_indexes_arr = ["""
CREATE INDEX cal_txtest_i_btree_unique ON cal_txtest USING btree(btreeunique);
""",
"""
"""
CREATE INDEX cal_txtest_i_btree_10 ON cal_txtest USING btree(btree10);
""",
"""
"""
CREATE INDEX cal_txtest_i_btree_100 ON cal_txtest USING btree(btree100);
""",
"""
"""
CREATE INDEX cal_txtest_i_btree_1000 ON cal_txtest USING btree(btree1000);
""",
"""
"""
CREATE INDEX cal_txtest_i_btree_10000 ON cal_txtest USING btree(btree10000);
""",
"""
"""
CREATE INDEX idx_cal_bfvtest_btree ON cal_bfvtest USING btree(id);
""",
"""
"""
CREATE INDEX cal_ndvtest_btree ON cal_ndvtest USING btree(val);
""",
]
]
_analyze_table = """
ANALYZE cal_txtest;
......@@ -222,120 +230,60 @@ _allow_system_mods_v5 = """
SET allow_system_table_mods to 'dml';
"""
# Make sure pg_statistics has smooth and precise statistics, so that the cardinality estimates we get are very precise
#
# For NDVs of 100 or less, list all of them
# For NDVs of more than 100, make some dummy NDVs and 5 intervals of the same length
# So far, id and btreeunique are not yet used (staattnums 1 and 2), no stats are changed
# Make sure pg_statistics and pg_class have accurate statistics, so that the cardinality estimates we get are very precise
_fix_statistics = ["""
UPDATE pg_statistic
SET stadistinct = 10,
stakind1 = 1,
stanumbers1 = '{ 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 }',
stavalues1 = '{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}'::int[]
WHERE starelid = 'cal_txtest'::regclass AND staattnum = 3;
""",
"""
UPDATE pg_statistic
SET stadistinct = 100,
stakind1 = 1,
stanumbers1 = '{ 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01 }',
stavalues1 = '{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 }'::int[]
WHERE starelid = 'cal_txtest'::regclass AND staattnum = 4;
""",
"""
UPDATE pg_statistic
SET stadistinct = 1000,
stakind1 = 1,
stanumbers1 = '{ 0.001, 0.001, 0.001 }',
stavalues1 = '{100, 200, 300}'::int[],
stakind2 = 2,
stanumbers2 = '{}',
stavalues2 = '{0, 199, 399, 599, 799, 999}'::int[]
WHERE starelid = 'cal_txtest'::regclass AND staattnum = 5;
""",
"""
UPDATE pg_statistic
SET stadistinct = 10000,
stakind1 = 1,
stanumbers1 = '{ 0.0001, 0.0001, 0.0001 }',
stavalues1 = '{1000, 2000, 3000}'::int[],
stakind2 = 2,
stanumbers2 = '{}',
stavalues2 = '{0, 1999, 3999, 5999, 7999, 9999}'::int[]
WHERE starelid = 'cal_txtest'::regclass AND staattnum = 6;
""",
"""
UPDATE pg_statistic
SET stadistinct = 10,
stakind1 = 1,
stanumbers1 = '{ 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 }',
stavalues1 = '{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}'::int[]
WHERE starelid = 'cal_txtest'::regclass AND staattnum = 7;
""",
"""
UPDATE pg_statistic
SET stadistinct = 100,
stakind1 = 1,
stanumbers1 = '{ 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01 }',
stavalues1 = '{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 }'::int[]
WHERE starelid = 'cal_txtest'::regclass AND staattnum = 8;
""",
"""
UPDATE pg_statistic
SET stadistinct = 1000,
stakind1 = 1,
stanumbers1 = '{ 0.001, 0.001, 0.001 }',
stavalues1 = '{100, 200, 300}'::int[],
stakind2 = 2,
stanumbers2 = '{}',
stavalues2 = '{0, 199, 399, 599, 799, 999}'::int[]
WHERE starelid = 'cal_txtest'::regclass AND staattnum = 9;
""",
"""
UPDATE pg_statistic
SET stadistinct = 10000,
stakind1 = 1,
stanumbers1 = '{ 0.0001, 0.0001, 0.0001 }',
stavalues1 = '{1000, 2000, 3000}'::int[],
stakind2 = 2,
stanumbers2 = '{}',
stavalues2 = '{0, 1999, 3999, 5999, 7999, 9999}'::int[]
WHERE starelid = 'cal_txtest'::regclass AND staattnum = 10;
""",
"""
UPDATE pg_statistic
SET stadistinct = 10000,
stakind1 = 1,
stanumbers1 = '{ 0.0001, 0.0001, 0.0001 }',
stavalues1 = '{1000, 2000, 3000}'::int[],
stakind2 = 2,
stanumbers2 = '{}',
stavalues2 = '{0, 1999, 3999, 5999, 7999, 9999}'::int[]
WHERE starelid = 'cal_dim'::regclass AND staattnum = 1;
""",
"""
_update_pg_class = """
UPDATE pg_class
SET reltuples = %i
WHERE relname = '%s';
"""
# add an MCV or histogram (stakind1 = 1 or 2) and a correlation (stakind2 = 3) value
_update_pg_stats = """
UPDATE pg_statistic
SET stadistinct = 10000,
stakind1 = 1,
stanumbers1 = '{ 0.0001, 0.0001, 0.0001 }',
stavalues1 = '{1000, 2000, 3000}'::int[],
stakind2 = 2,
stanumbers2 = '{}',
stavalues2 = '{0, 1999, 3999, 5999, 7999, 9999}'::int[]
WHERE starelid = 'cal_dim'::regclass AND staattnum = 2;
""" ]
SET stadistinct = %f,
stakind1 = %d,
stanumbers1 = %s,
stavalues1 = %s,
stakind2 = 3,
stanumbers2 = '{ %f }',
stavalues2 = NULL,
stakind3 = 0,
stanumbers3 = NULL,
stavalues3 = NULL,
stakind4 = 0,
stanumbers4 = NULL,
stavalues4 = NULL
WHERE starelid = '%s'::regclass AND staattnum = %i;
"""
# columns to fix, in the format (table name, column name, attnum, ndv, num rows)
# use -1 as the NDV for unique columns and use -1 for the variable number of rows in the fact table
_stats_cols_to_fix = [
('cal_txtest', 'id', 1, -1, -1),
('cal_txtest', 'btreeunique', 2, -1, -1),
('cal_txtest', 'btree10', 3, 10, -1),
('cal_txtest', 'btree100', 4, 100, -1),
('cal_txtest', 'btree1000', 5, 1000, -1),
('cal_txtest', 'btree10000', 6, 10000, -1),
('cal_txtest', 'bitmap10', 7, 10, -1),
('cal_txtest', 'bitmap100', 8, 100, -1),
('cal_txtest', 'bitmap1000', 9, 1000, -1),
('cal_txtest', 'bitmap10000', 10, 10000, -1),
('cal_dim', 'dim_id', 1, -1, glob_dim_table_rows),
('cal_dim', 'dim_id2', 2, -1, glob_dim_table_rows)
]
# deal with command line arguments
# -----------------------------------------------------------------------------
def parseargs():
parser = argparse.ArgumentParser(description=_help, version='1.0')
parser = argparse.ArgumentParser(description=_help)
parser.add_argument("tests", metavar="TEST", choices=[ [], "all", "none", "bitmap_scan_tests", "bitmap_join_tests" ], nargs="*",
help="Run these tests (all, bitmap_scan_tests, bitmap_join_tests), default is none")
parser.add_argument("tests", metavar="TEST", choices=[[], "all", "none", "bitmap_scan_tests", "btree_ao_scan_tests",
"bitmap_ndv_scan_tests", "index_join_tests", "bfv_join_tests"],
nargs="*",
help="Run these tests (all, none, bitmap_scan_tests, btree_ao_scan_tests, bitmap_ndv_scan_tests, index_join_tests, bfv_join_tests), default is none")
parser.add_argument("--create", action="store_true",
help="Create the tables to use in the test")
parser.add_argument("--execute", type=int, default="0",
......@@ -346,14 +294,14 @@ def parseargs():
help="Print more verbose output")
parser.add_argument("--logFile", default="",
help="Log diagnostic output to a file")
parser.add_argument("--host", default="localhost",
help="Host to connect to (default is localhost).")
parser.add_argument("--host", default="",
help="Host to connect to (default is localhost or $PGHOST, if set).")
parser.add_argument("--port", type=int, default="0",
help="Port on the host to connect to")
help="Port on the host to connect to (default is 0 or $PGPORT, if set)")
parser.add_argument("--dbName", default="",
help="Database name to connect to")
parser.add_argument("--appendOnly", action="store_true",
help="Create an append-only table (uses only bitmap indexes). Default is a heap table")
help="Create an append-only table. Default is a heap table")
parser.add_argument("--numRows", type=int, default="10000000",
help="Number of rows to INSERT INTO the table (default is 10 million)")
......@@ -363,6 +311,7 @@ def parseargs():
args = parser.parse_args()
return args, parser
def log_output(str):
if glob_verbose:
print(str)
......@@ -376,12 +325,15 @@ def log_output(str):
def connect(host, port_num, db_name):
try:
dburl = dbconn.DbURL(hostname=host, port=port_num, dbname=db_name)
conn = dbconn.connect(dburl, encoding="UTF8")
conn = dbconn.connect(dburl, encoding="UTF8", unsetSearchPath=False)
except Exception as e:
print("Exception during connect: %s" % e)
print(("Exception during connect: %s" % e))
quit()
return conn
def select_version(conn):
global glob_gpdb_major_version
sqlStr = "SELECT version()"
......@@ -401,6 +353,7 @@ def select_version(conn):
for row in rows:
log_output(str(row[0]))
def execute_sql(conn, sqlStr):
try:
log_output("")
......@@ -408,28 +361,47 @@ def execute_sql(conn, sqlStr):
dbconn.execSQL(conn, sqlStr)
except Exception as e:
print("")
print("Error executing query: %s; Reason: %s" % (sqlStr, e))
print(("Error executing query: %s; Reason: %s" % (sqlStr, e)))
dbconn.execSQL(conn, "abort")
def select_first_int(conn, sqlStr):
try:
log_output("")
log_output("Executing query: %s" % sqlStr)
curs = dbconn.query(conn, sqlStr)
rows = curs.fetchall()
for row in rows:
return int(row[0])
except Exception as e:
print("")
print(("Error executing query: %s; Reason: %s" % (sqlStr, e)))
dbconn.execSQL(conn, "abort")
def execute_sql_arr(conn, sqlStrArr):
for sqlStr in sqlStrArr:
execute_sql(conn, sqlStr)
def execute_and_commit_sql(conn, sqlStr):
execute_sql(conn, sqlStr)
commit_db(conn)
def commit_db(conn):
execute_sql(conn, "commit")
# run an SQL statement and return the elapsed wallclock time, in seconds
def timed_execute_sql(conn, sqlStr):
start = time.time()
execute_sql(conn, sqlStr)
num_rows = select_first_int(conn, sqlStr)
end = time.time()
elapsed_time_in_msec = round((end-start)*1000)
log_output("Elapsed time (msec): %.0f" % elapsed_time_in_msec)
return elapsed_time_in_msec
elapsed_time_in_msec = round((end - start) * 1000)
log_output("Elapsed time (msec): %d, rows: %d" % (elapsed_time_in_msec, num_rows))
return elapsed_time_in_msec, num_rows
# run an SQL statement n times, unless it takes longer than a timeout
......@@ -437,17 +409,21 @@ def timed_execute_sql(conn, sqlStr):
def timed_execute_n_times(conn, sqlStr, exec_n_times):
sum_exec_times = 0.0
sum_square_exec_times = 0.0
e = 1
e = 0
act_num_exes = exec_n_times
while e <= act_num_exes:
exec_time = timed_execute_sql(conn, sqlStr)
num_rows = -1
while e < act_num_exes:
exec_time, local_num_rows = timed_execute_sql(conn, sqlStr)
e = e + 1
sum_exec_times += exec_time
sum_square_exec_times += exec_time*exec_time
sum_square_exec_times += exec_time * exec_time
if num_rows >= 0 and local_num_rows != num_rows:
log_output("Inconsistent number of rows returned: %d and %d" % (num_rows, local_num_rows))
num_rows = local_num_rows
if exec_time > glob_exe_timeout:
# we exceeded the timeout, don't keep executing this long query
act_num_exes = e
log_output("Query %s exceeded the timeout of %d seconds" % (sqlStr, glob_exe_timeout))
e = e+1
# compute mean and standard deviation of the execution times
mean = sum_exec_times / act_num_exes
......@@ -456,7 +432,7 @@ def timed_execute_n_times(conn, sqlStr, exec_n_times):
variance = 0.0
else:
variance = sum_square_exec_times / act_num_exes - mean * mean
return (round(mean, 3), round(math.sqrt(variance), 3), act_num_exes)
return (round(mean, 3), round(math.sqrt(variance), 3), act_num_exes, num_rows)
# Explain a query and find a table scan or index scan in an explain output
......@@ -474,11 +450,13 @@ def explain_index_scan(conn, sqlStr):
table_scan_pattern = TABLE_SCAN_PATTERN
index_scan_pattern = INDEX_SCAN_PATTERN
bitmap_scan_pattern = BITMAP_SCAN_PATTERN
fallback_pattern = FALLBACK_PATTERN
if (glob_gpdb_major_version) <= 5:
table_scan_pattern = TABLE_SCAN_PATTERN_V5
index_scan_pattern = INDEX_SCAN_PATTERN_V5
bitmap_scan_pattern = BITMAP_SCAN_PATTERN_V5
fallback_pattern = FALLBACK_PATTERN_V5
for row in rows:
log_output(row[0])
if re.search(TABLE_NAME_PATTERN, row[0]) or re.search(NDV_TABLE_NAME_PATTERN, row[0]):
......@@ -491,6 +469,9 @@ def explain_index_scan(conn, sqlStr):
elif re.search(table_scan_pattern, row[0]):
scan_type = TABLE_SCAN
cost = cost_from_explain_line(row[0])
elif re.search(fallback_pattern, row[0]):
log_output("*** ERROR: Fallback")
scan_type = FALLBACK_PLAN
except Exception as e:
log_output("\n*** ERROR explaining query:\n%s;\nReason: %s" % ("explain " + sqlStr, e))
......@@ -515,12 +496,14 @@ def explain_join_scan(conn, sqlStr):
table_scan_pattern = TABLE_SCAN_PATTERN
index_scan_pattern = INDEX_SCAN_PATTERN
bitmap_scan_pattern = BITMAP_SCAN_PATTERN
fallback_pattern = FALLBACK_PATTERN
if (glob_gpdb_major_version) <= 5:
hash_join_pattern = HASH_JOIN_PATTERN_V5
nl_join_pattern = NL_JOIN_PATTERN_V5
table_scan_pattern = TABLE_SCAN_PATTERN_V5
index_scan_pattern = INDEX_SCAN_PATTERN_V5
bitmap_scan_pattern = BITMAP_SCAN_PATTERN_V5
fallback_pattern = FALLBACK_PATTERN_V5
# save the cost of the join above the scan type
for row in rows:
......@@ -537,6 +520,9 @@ def explain_join_scan(conn, sqlStr):
scan_type = INDEX_SCAN
elif re.search(table_scan_pattern, row[0]):
scan_type = TABLE_SCAN
elif re.search(fallback_pattern, row[0]):
log_output("*** ERROR: Fallback")
scan_type = FALLBACK_PLAN
except Exception as e:
log_output("\n*** ERROR explaining query:\n%s;\nReason: %s" % ("explain " + sqlStr, e))
......@@ -557,12 +543,13 @@ def cost_from_explain_line(line):
# iterate over one parameterized query, using a range of parameter values, explaining and (optionally) executing the query
def find_crossover(conn, lowParamValue, highParamLimit, setup, parameterizeMethod, explain_method, reset_method, plan_ids, force_methods, execute_n_times):
def find_crossover(conn, lowParamValue, highParamLimit, setup, parameterizeMethod, explain_method, reset_method,
plan_ids, force_methods, execute_n_times):
# expects the following:
# - conn: A connection
# - lowParamValue: The lowest (integer) value to try for the parameter
# - highParamLimit: The highest (integer) value to try for the parameter + 1
# - setup: A method that runs any sql needed for setup before a particular select run, given a parameterized query and a paramter value
# - setup: A method that runs any sql needed for setup before a particular select run, given a parameterized query and a parameter value
# - parameterizeMethod: A method to generate the actual query text, given a parameterized query and a parameter value
# - explain_method: A method that takes a connection and an SQL string and returns a tuple (plan, cost)
# - reset_method: A method to reset all gucs and similar switches, to get the default plan by the optimizer
......@@ -571,7 +558,7 @@ def find_crossover(conn, lowParamValue, highParamLimit, setup, parameterizeMetho
# - force_methods: A list with <p> methods to force each plan id in the plan_ids array (these methods usually set gucs)
# each methods takes one parameter, the connection
# - execute_n_times: The number of times to execute the query (0 means don't execute, n>0 means execute n times)
# returns the following:
# - An explain dictionary, containing a mapping between a subset of the parameter values and result tuples, each result tuple consisting of
# <p> + 2 values:
......@@ -597,16 +584,17 @@ def find_crossover(conn, lowParamValue, highParamLimit, setup, parameterizeMetho
reset_method(conn)
# determine the increment
incParamValue = (highParamLimit - lowParamValue) / 10
incParamValue = (highParamLimit - lowParamValue) // 10
if incParamValue == 0:
incParamValue = 1
elif highParamLimit <= lowParamValue:
errMessages.append("Low parameter value %d must be less than high parameter limit %d" % (lowParamValue, highParamLimit))
errMessages.append(
"Low parameter value %d must be less than high parameter limit %d" % (lowParamValue, highParamLimit))
return (explainDict, execDict, errMessages)
# first part, run through the parameter values and determine the plan and cost chosen by the optimizer
for paramValue in range(lowParamValue, highParamLimit, incParamValue):
# do any setup required
setupString = setup(paramValue)
execute_sql(conn, setupString)
......@@ -615,7 +603,7 @@ def find_crossover(conn, lowParamValue, highParamLimit, setup, parameterizeMetho
(plan, cost) = explain_method(conn, sqlString)
explainDict[paramValue] = (plan, cost)
log_output("For param value %d the optimizer chose %s with a cost of %f" % (paramValue, plan, cost))
# look for the crossover from one plan to another
if not expCrossoverOccurred and paramValue > lowParamValue and plan != expPrevPlan:
expCrossoverOccurred = True
......@@ -628,7 +616,8 @@ def find_crossover(conn, lowParamValue, highParamLimit, setup, parameterizeMetho
# execute the query, if requested
if execute_n_times > 0:
timed_execute_and_check_timeout(conn, sqlString, execute_n_times, paramValue, OPTIMIZER_DEFAULT_PLAN, execDict, timedOutDict, errMessages)
timed_execute_and_check_timeout(conn, sqlString, execute_n_times, paramValue, OPTIMIZER_DEFAULT_PLAN,
execDict, timedOutDict, errMessages)
# second part, force different plans and record the costs
for plan_num in range(0, len(plan_ids)):
......@@ -640,18 +629,22 @@ def find_crossover(conn, lowParamValue, highParamLimit, setup, parameterizeMetho
# do any setup required
setupString = setup(paramValue)
execute_sql(conn, setupString)
# explain the query with the forced plan
# explain the query with the forced plan
sqlString = parameterizeMethod(paramValue)
(plan, cost) = explain_method(conn, sqlString)
if plan_id != plan:
errMessages.append("For parameter value %d we tried to force a %s plan but got a %s plan." % (paramValue, plan_id, plan))
log_output("For parameter value %d we tried to force a %s plan but got a %s plan." % (paramValue, plan_id, plan))
errMessages.append("For parameter value %d we tried to force a %s plan but got a %s plan." % (
paramValue, plan_id, plan))
log_output("For parameter value %d we tried to force a %s plan but got a %s plan." % (
paramValue, plan_id, plan))
# update the result dictionary
resultList = list(explainDict[paramValue])
defaultPlanCost = resultList[1]
# sanity check, the forced plan shouldn't have a cost that is lower than the default plan cost
if defaultPlanCost > cost * 1.1:
errMessages.append("For parameter value %d and forced %s plan we got a cost of %f that is lower than the default cost of %f for the default %s plan." % (paramValue, plan_id, cost, defaultPlanCost, resultList[0]))
errMessages.append(
"For parameter value %d and forced %s plan we got a cost of %f that is lower than the default cost of %f for the default %s plan." % (
paramValue, plan_id, cost, defaultPlanCost, resultList[0]))
resultList.append(cost)
explainDict[paramValue] = tuple(resultList)
log_output("For param value %d we forced %s with a cost of %f" % (paramValue, plan, cost))
......@@ -659,7 +652,8 @@ def find_crossover(conn, lowParamValue, highParamLimit, setup, parameterizeMetho
# execute the forced plan
if execute_n_times > 0:
# execute the query <execute_n_times> times and record the mean and stddev of the time in execDict
timed_execute_and_check_timeout(conn, sqlString, execute_n_times, paramValue, plan_id, execDict, timedOutDict, errMessages)
timed_execute_and_check_timeout(conn, sqlString, execute_n_times, paramValue, plan_id, execDict,
timedOutDict, errMessages)
# cleanup at exit
reset_method(conn)
......@@ -678,23 +672,26 @@ def checkForOptimizerErrors(paramValue, chosenPlan, plan_ids, execDict):
defaultExeTime = 1E6
defaultStdDev = 0.0
if (paramValue, OPTIMIZER_DEFAULT_PLAN) in execDict:
defaultExeTime, defaultStdDev = execDict[(paramValue, OPTIMIZER_DEFAULT_PLAN)]
defaultExeTime, defaultStdDev, numRows = execDict[(paramValue, OPTIMIZER_DEFAULT_PLAN)]
if (paramValue, chosenPlan) in execDict:
forcedExeTime, forcedStdDev = execDict[(paramValue, chosenPlan)]
defaultExeTime = min(defaultExeTime, forcedExeTime)
defaultStdDev = max(defaultStdDev, forcedStdDev)
forcedExeTime, forcedStdDev, numRows = execDict[(paramValue, chosenPlan)]
if forcedExeTime < defaultExeTime:
defaultExeTime = forcedExeTime
defaultStdDev = forcedStdDev
for pl in plan_ids:
if (paramValue, pl) in execDict:
altExeTime, altStdDev = execDict[(paramValue, pl)]
altExeTime, altStdDev, numRows = execDict[(paramValue, pl)]
# The execution times tend to be fairly unreliable. Try to avoid false positives by
# requiring a significantly better alternative, measured in standard deviations.
if altExeTime + glob_sigma_diff * max(defaultStdDev, altStdDev) < defaultExeTime:
optimizerError = 100.0 * (defaultExeTime - altExeTime) / defaultExeTime
# yes, plan pl is significantly better than the optimizer default choice
return (pl, round(optimizerError,1))
return (pl, round(optimizerError, 1))
elif chosenPlan == FALLBACK_PLAN:
return (FALLBACK_PLAN, -1.0)
# the optimizer chose the right plan (at least we have not enough evidence to the contrary)
return ("", 0.0)
......@@ -702,7 +699,7 @@ def checkForOptimizerErrors(paramValue, chosenPlan, plan_ids, execDict):
# print the results of one test run
def print_results(testTitle, explainDict, execDict, errMessages, plan_ids):
def print_results(testTitle, explainDict, execDict, errMessages, plan_ids, execute_n_times):
# print out the title of the test
print("")
print(testTitle)
......@@ -710,11 +707,11 @@ def print_results(testTitle, explainDict, execDict, errMessages, plan_ids):
exeTimes = len(execDict) > 0
# make a list of plan ids with the default plan ids as first entry
plan_ids_with_default = [ OPTIMIZER_DEFAULT_PLAN ]
plan_ids_with_default = [OPTIMIZER_DEFAULT_PLAN]
plan_ids_with_default.extend(plan_ids)
# print a header row
headerList = [ "Parameter value", "Plan chosen by optimizer", "Cost" ]
headerList = ["Parameter value", "Plan chosen by optimizer", "Cost"]
for p_id in plan_ids:
headerList.append("Cost of forced %s plan" % p_id)
if exeTimes:
......@@ -723,10 +720,12 @@ def print_results(testTitle, explainDict, execDict, errMessages, plan_ids):
headerList.append("Execution time for default plan (ms)")
for p_id in plan_ids:
headerList.append("Execution time for forced %s plan (ms)" % p_id)
headerList.append("std dev default")
for p_id in plan_ids:
headerList.append("std dev %s" % p_id)
print(", ".join(headerList))
if execute_n_times > 1:
headerList.append("Std dev default")
for p_id in plan_ids:
headerList.append("Std dev %s" % p_id)
headerList.append("Selectivity pct")
print((", ".join(headerList)))
# sort the keys of the dictionary by parameter value
sorted_params = sorted(explainDict.keys())
......@@ -735,7 +734,7 @@ def print_results(testTitle, explainDict, execDict, errMessages, plan_ids):
for p_val in sorted_params:
# add the explain-related values
vals = explainDict[p_val]
resultList = [ str(p_val) ]
resultList = [str(p_val)]
for v in vals:
resultList.append(str(v))
# add the execution-related values, if applicable
......@@ -746,32 +745,39 @@ def print_results(testTitle, explainDict, execDict, errMessages, plan_ids):
resultList.append(str(optimizerError))
stddevList = []
# our execution times will be a list of 2* (p+1) items,
# (default exe time, forced exe time plan 1 ... p, stddev for default time, stddevs for plans 1...p)
num_rows = -1
# our execution times will be a list of 2* (p+1) + 1 items,
# (default exe time, forced exe time plan 1 ... p, stddev for default time, stddevs for plans 1...p, selectivity)
# now loop over the list of p+1 plan ids
for plan_id in plan_ids_with_default:
if (p_val, plan_id) in execDict:
# we did execute the query for this, append the avg time
# right away and save the standard deviation for later
mean, stddev = execDict[(p_val, plan_id)]
mean, stddev, local_num_rows = execDict[(p_val, plan_id)]
resultList.append(str(mean))
stddevList.append(str(stddev))
if num_rows >= 0 and local_num_rows != num_rows:
errMessages.append("Inconsistent number of rows for parameter value %d: %d and %d" % (p_val, num_rows, local_num_rows))
num_rows = local_num_rows
else:
# we didn't execute this query, add blank values
resultList.append("")
stddevList.append("")
# now add the standard deviations to the end of resultList
resultList.extend(stddevList)
if execute_n_times > 1:
# now add the standard deviations to the end of resultList
resultList.extend(stddevList)
# finally, the selectivity in percent
resultList.append(str((100.0 * num_rows) / glob_rowcount))
# print a comma-separated list of result values (CSV)
print(", ".join(resultList))
print((", ".join(resultList)))
# if there are any errors, print them at the end, leaving an empty line between the result and the errors
if (len(errMessages) > 0):
print("")
print("%d diagnostic message(s):" % len(errMessages))
print(("%d diagnostic message(s):" % len(errMessages)))
for e in errMessages:
print(e)
......@@ -779,7 +785,8 @@ def print_results(testTitle, explainDict, execDict, errMessages, plan_ids):
# execute a query n times, with a guard against long-running queries,
# and record the result in execDict and any errors in errMessages
def timed_execute_and_check_timeout(conn, sqlString, execute_n_times, paramValue, plan_id, execDict, timedOutDict, errMessages):
def timed_execute_and_check_timeout(conn, sqlString, execute_n_times, paramValue, plan_id, execDict, timedOutDict,
errMessages):
# timedOutDict contains a record of queries that have previously timed out:
# plan_id -> (lowest param value for timeout, highest value for timeout, direction)
# right now we ignore low/high values and direction (whether the execution increases or decreases with
......@@ -792,17 +799,18 @@ def timed_execute_and_check_timeout(conn, sqlString, execute_n_times, paramValue
return
# execute the query
mean, stddev, num_execs = timed_execute_n_times(conn, sqlString, execute_n_times)
mean, stddev, num_execs, num_rows = timed_execute_n_times(conn, sqlString, execute_n_times)
# record the execution stats
execDict[(paramValue, plan_id)] = (mean, stddev)
execDict[(paramValue, plan_id)] = (mean, stddev, num_rows)
# check for timeouts
if num_execs < execute_n_times or mean > glob_exe_timeout:
# record the timeout, without worrying about low/high values or directions for now
timedOutDict[plan_id] = (paramValue, paramValue, "unknown_direction")
errMessages.append("The %s plan for parameter value %d took more than the allowed timeout, it was executed only %d time(s)" %
(plan_id, paramValue, num_execs))
errMessages.append(
"The %s plan for parameter value %d took more than the allowed timeout, it was executed only %d time(s)" %
(plan_id, paramValue, num_execs))
# Definition of various test suites
......@@ -832,44 +840,44 @@ def timed_execute_and_check_timeout(conn, sqlString, execute_n_times, paramValue
# GUC set statements
_reset_index_scan_forces = [ """
_reset_index_scan_forces = ["""
SELECT enable_xform('CXformImplementBitmapTableGet');
""",
"""
"""
SELECT enable_xform('CXformGet2TableScan');
""" ]
_force_sequential_scan = [ """
_force_sequential_scan = ["""
SELECT disable_xform('CXformImplementBitmapTableGet');
""" ]
"""]
_force_index_scan = [ """
_force_index_scan = ["""
SELECT disable_xform('CXformGet2TableScan');
""" ]
"""]
_reset_index_join_forces = [ """
_reset_index_join_forces = ["""
SELECT enable_xform('CXformPushGbBelowJoin');
""",
"""
"""
RESET optimizer_enable_indexjoin;
""",
"""
"""
RESET optimizer_enable_hashjoin;
""" ]
"""]
_force_hash_join = [ """
_force_hash_join = ["""
SELECT disable_xform('CXformPushGbBelowJoin');
""",
"""
"""
SET optimizer_enable_indexjoin to off;
""" ]
"""]
_force_index_nlj = [ """
_force_index_nlj = ["""
SELECT disable_xform('CXformPushGbBelowJoin');
""",
"""
"""
SET optimizer_enable_hashjoin to off;
""" ]
"""]
# setup statements
......@@ -882,7 +890,7 @@ ANALYZE cal_bfvtest;
ANALYZE cal_bfv_dim;
"""
_insert_into_ndv_tables= """
_insert_into_ndv_tables = """
TRUNCATE cal_ndvtest;
INSERT INTO cal_ndvtest SELECT i, i %% %d FROM (SELECT generate_series(1,1000000) i)a;
ANALYZE cal_ndvtest;
......@@ -917,7 +925,49 @@ WHERE bitmap10000 BETWEEN 0 AND %d;
_bitmap_select_pt01_pct_multi = """
SELECT count(*) %s
FROM cal_txtest
WHERE bitmap10000 = 0 OR bitmap10000 BETWEEN 2 AND %d+2;
WHERE bitmap10000 = 0 OR bitmap10000 BETWEEN 2 AND %d+1;
"""
_btree_select_unique = """
SELECT count(*) %s
FROM cal_txtest
WHERE btreeunique BETWEEN 0 AND %d;
"""
_btree_select_10_pct = """
SELECT count(*) %s
FROM cal_txtest
WHERE btree10 BETWEEN 0 AND %d;
"""
_btree_select_1_pct = """
SELECT count(*) %s
FROM cal_txtest
WHERE btree100 BETWEEN 0 AND %d;
"""
_btree_select_pt1_pct = """
SELECT count(*) %s
FROM cal_txtest
WHERE btree1000 BETWEEN 0 AND %d;
"""
_btree_select_pt01_pct = """
SELECT count(*) %s
FROM cal_txtest
WHERE btree10000 BETWEEN 0 AND %d;
"""
_btree_select_pt01_pct_multi = """
SELECT count(*) %s
FROM cal_txtest
WHERE btree10000 = 0 OR btree10000 BETWEEN 2 AND %d+1;
"""
_btree_select_unique_in = """
SELECT count(*) %s
FROM cal_txtest
WHERE btreeunique IN ( %s );
"""
_bitmap_index_join = """
......@@ -926,6 +976,12 @@ FROM cal_txtest f JOIN cal_dim d ON f.bitmap10000 = d.dim_id
WHERE d.dim_id2 BETWEEN 0 AND %d;
"""
_btree_index_join = """
SELECT count(*) %s
FROM cal_txtest f JOIN cal_dim d ON f.btree10000 = d.dim_id
WHERE d.dim_id2 BETWEEN 0 AND %d;
"""
_bfv_join = """
SELECT count(*)
FROM cal_bfvtest ft, cal_bfv_dim dt1
......@@ -938,6 +994,7 @@ FROM cal_ndvtest
WHERE val <= 1000000;
"""
# Parameterize methods for the test queries above
# -----------------------------------------------------------------------------
......@@ -945,63 +1002,139 @@ WHERE val <= 1000000;
def parameterize_bitmap_index_10_narrow(paramValue):
return _bitmap_select_10_pct % ("", paramValue)
def parameterize_bitmap_index_10_wide(paramValue):
return _bitmap_select_10_pct % (", max(txt)", paramValue)
# bitmap index scan with 0...100 % of values, for parameter values 0...10,000, in .01 % increments
def parameterize_bitmap_index_10000_narrow(paramValue):
return _bitmap_select_pt01_pct % ("", paramValue)
def parameterize_bitmap_index_10000_wide(paramValue):
return _bitmap_select_pt01_pct % (", max(txt)", paramValue)
# bitmap index scan with 0...100 % of values, for parameter values 0...10,000, in .01 % increments, multiple ranges
def parameterize_bitmap_index_10000_multi_narrow(paramValue):
return _bitmap_select_pt01_pct_multi % ("", paramValue)
def parameterize_bitmap_index_10000_multi_wide(paramValue):
return _bitmap_select_pt01_pct_multi % (", max(txt)", paramValue)
# bitmap index scan on AO btree index with 0...100 % of values, for parameter values 0...10, in 10 % increments
def parameterize_btree_index_unique_narrow(paramValue):
return _btree_select_unique % ("", paramValue)
def parameterize_btree_index_unique_wide(paramValue):
return _btree_select_unique % (", max(txt)", paramValue)
def parameterize_btree_index_100_narrow(paramValue):
return _btree_select_1_pct % ("", paramValue)
def parameterize_btree_index_100_wide(paramValue):
return _btree_select_1_pct % (", max(txt)", paramValue)
# bitmap index scan on AO btree index with 0...100 % of values, for parameter values 0...10,000, in .01 % increments
def parameterize_btree_index_10000_narrow(paramValue):
return _btree_select_pt01_pct % ("", paramValue)
def parameterize_btree_index_10000_wide(paramValue):
return _btree_select_pt01_pct % (", max(txt)", paramValue)
# bitmap index scan on AO btree index with 0...100 % of values, for parameter values 0...10,000, in .01 % increments, multiple ranges
def parameterize_btree_index_10000_multi_narrow(paramValue):
return _btree_select_pt01_pct_multi % ("", paramValue)
def parameterize_btree_index_10000_multi_wide(paramValue):
return _btree_select_pt01_pct_multi % (", max(txt)", paramValue)
def parameterize_btree_unique_in_narrow(paramValue):
inlist = "0"
for p in range(1, paramValue+1):
inlist += ", " + str(5*p)
return _btree_select_unique_in % ("", inlist)
def parameterize_btree_unique_in_wide(paramValue):
inlist = "0"
for p in range(1, paramValue+1):
inlist += ", " + str(5*p)
return _btree_select_unique_in % (", max(txt)", inlist)
# index join with 0...100 % of fact values, for parameter values 0...10,000, in .01 % increments
def parameterize_bitmap_join_narrow(paramValue):
return _bitmap_index_join % ("", paramValue)
def parameterize_bitmap_join_wide(paramValue):
return _bitmap_index_join % (", max(f.txt)", paramValue)
def parameterize_btree_join_narrow(paramValue):
return _btree_index_join % ("", paramValue)
def parameterize_btree_join_wide(paramValue):
return _btree_index_join % (", max(f.txt)", paramValue)
def parameterize_insert_join_bfv(paramValue):
return _insert_into_bfv_tables % (paramValue, paramValue)
def parameterize_insert_ndv(paramValue):
return _insert_into_ndv_tables % (paramValue)
def parameterize_bitmap_join_bfv(paramValue):
return _bfv_join
def parameterize_bitmap_index_ndv(paramValue):
return _bitmap_index_ndv
def noSetupRequired(paramValue):
return "SELECT 1;"
def explain_bitmap_index(conn, sqlStr):
return explain_index_scan(conn, sqlStr)
def reset_index_test(conn):
execute_sql_arr(conn, _reset_index_scan_forces)
def force_table_scan(conn):
execute_sql_arr(conn, _force_sequential_scan)
def force_bitmap_scan(conn):
execute_sql_arr(conn, _force_index_scan)
def reset_index_join(conn):
execute_sql_arr(conn, _reset_index_join_forces)
def force_hash_join(conn):
execute_sql_arr(conn, _force_hash_join)
def force_index_join(conn):
execute_sql_arr(conn, _force_index_nlj)
......@@ -1009,26 +1142,34 @@ def force_index_join(conn):
# Helper methods for running tests
# -----------------------------------------------------------------------------
def run_one_bitmap_scan_test(conn, testTitle, paramValueLow, paramValueHigh, setup, parameterizeMethod, execute_n_times):
plan_ids = [ BITMAP_SCAN, TABLE_SCAN ]
force_methods = [ force_bitmap_scan, force_table_scan ]
explainDict, execDict, errors = find_crossover(conn, paramValueLow, paramValueHigh, setup, parameterizeMethod, explain_bitmap_index, reset_index_test, plan_ids, force_methods, execute_n_times)
print_results(testTitle, explainDict, execDict, errors, plan_ids)
def run_one_bitmap_scan_test(conn, testTitle, paramValueLow, paramValueHigh, setup, parameterizeMethod,
execute_n_times):
log_output("Running bitmap scan test " + testTitle)
plan_ids = [BITMAP_SCAN, TABLE_SCAN]
force_methods = [force_bitmap_scan, force_table_scan]
explainDict, execDict, errors = find_crossover(conn, paramValueLow, paramValueHigh, setup, parameterizeMethod,
explain_bitmap_index, reset_index_test, plan_ids, force_methods,
execute_n_times)
print_results(testTitle, explainDict, execDict, errors, plan_ids, execute_n_times)
def run_one_bitmap_join_test(conn, testTitle, paramValueLow, paramValueHigh, setup, parameterizeMethod, execute_n_times):
plan_ids = [ BITMAP_SCAN, TABLE_SCAN ]
force_methods = [ force_index_join, force_hash_join ]
explainDict, execDict, errors = find_crossover(conn, paramValueLow, paramValueHigh, setup, parameterizeMethod, explain_join_scan, reset_index_join, plan_ids, force_methods, execute_n_times)
print_results(testTitle, explainDict, execDict, errors, plan_ids)
def run_one_bitmap_join_test(conn, testTitle, paramValueLow, paramValueHigh, setup, parameterizeMethod,
execute_n_times):
log_output("Running bitmap join test " + testTitle)
plan_ids = [BITMAP_SCAN, TABLE_SCAN]
force_methods = [force_index_join, force_hash_join]
explainDict, execDict, errors = find_crossover(conn, paramValueLow, paramValueHigh, setup, parameterizeMethod,
explain_join_scan, reset_index_join, plan_ids, force_methods,
execute_n_times)
print_results(testTitle, explainDict, execDict, errors, plan_ids, execute_n_times)
# Main driver for the tests
# -----------------------------------------------------------------------------
def run_bitmap_index_scan_tests(conn, execute_n_times):
run_one_bitmap_scan_test(conn,
"Bitmap Scan Test, NDV=10, selectivity_pct=10*parameter_value, count(*)",
"Bitmap Scan Test; NDV=10; selectivity_pct=10*parameter_value; count(*)",
0,
10,
noSetupRequired,
......@@ -1037,102 +1178,202 @@ def run_bitmap_index_scan_tests(conn, execute_n_times):
# all full table scan, no crossover
run_one_bitmap_scan_test(conn,
"Bitmap Scan Test, NDV=10, selectivity_pct=10*parameter_value, max(txt)",
"Bitmap Scan Test; NDV=10; selectivity_pct=10*parameter_value; max(txt)",
0,
3,
6,
noSetupRequired,
parameterize_bitmap_index_10_wide,
execute_n_times)
run_one_bitmap_scan_test(conn,
"Bitmap Scan Test, NDV=10000, selectivity_pct=0.01*parameter_value, count(*)",
"Bitmap Scan Test; NDV=10000; selectivity_pct=0.01*parameter_value; count(*)",
0,
20,
600 if glob_appendonly else 20,
noSetupRequired,
parameterize_bitmap_index_10000_narrow,
execute_n_times)
run_one_bitmap_scan_test(conn,
"Bitmap Scan Test, NDV=10000, selectivity_pct=0.01*parameter_value, count(*), largeNDV test",
"Bitmap Scan Test; NDV=10000; selectivity_pct=0.01*parameter_value; max(txt)",
0,
300,
noSetupRequired,
parameterize_bitmap_index_10000_narrow,
execute_n_times)
run_one_bitmap_scan_test(conn,
"Bitmap Scan Test, NDV=10000, selectivity_pct=0.01*parameter_value, max(txt)",
5,
25,
300 if glob_appendonly else 20,
noSetupRequired,
parameterize_bitmap_index_10000_wide,
execute_n_times)
run_one_bitmap_scan_test(conn,
"Bitmap Scan Test, multi-range, NDV=10000, selectivity_pct=0.01*parameter_value, count(*)",
"Bitmap Scan Test; multi-range; NDV=10000; selectivity_pct=0.01*parameter_value; count(*)",
0,
100,
600 if glob_appendonly else 20,
noSetupRequired,
parameterize_bitmap_index_10000_multi_narrow,
execute_n_times)
run_one_bitmap_scan_test(conn,
"Bitmap Scan Test, multi-range, NDV=10000, selectivity_pct=0.01*parameter_value, max(txt)",
"Bitmap Scan Test; multi-range; NDV=10000; selectivity_pct=0.01*parameter_value; max(txt)",
0,
60,
300 if glob_appendonly else 20,
noSetupRequired,
parameterize_bitmap_index_10000_multi_wide,
execute_n_times)
def run_bitmap_ndv_scan_tests(conn, execute_n_times):
run_one_bitmap_scan_test(conn,
"Bitmap Scan Test, ndv test, rows=1000000, parameter = insert statement modulo, count(*)",
1, # modulo ex. would replace x in the following: SELECT i % x FROM generate_series(1,10000)i;
10000, #max here is 10000 (num of rows)
"Bitmap Scan Test; ndv test; rows=1000000; parameter = insert statement modulo; count(*)",
1,
# modulo ex. would replace x in the following: SELECT i % x FROM generate_series(1,10000)i;
10000, # max here is 10000 (num of rows)
parameterize_insert_ndv,
parameterize_bitmap_index_ndv,
execute_n_times)
def run_bitmap_index_join_tests(conn, execute_n_times):
def run_btree_ao_index_scan_tests(conn, execute_n_times):
# use the unique btree index (no bitmap equivalent), 0 to 10,000 rows
run_one_bitmap_scan_test(conn,
"Btree Scan Test; unique; selectivity_pct=100*parameter_value/%d; count(*)" % glob_rowcount,
0,
glob_rowcount // 10, # 10% is the max allowed selectivity for a btree scan on an AO table
noSetupRequired,
parameterize_btree_index_unique_narrow,
execute_n_times)
run_one_bitmap_scan_test(conn,
"Btree Scan Test; unique; selectivity_pct=100*parameter_value/%d; max(txt)" % glob_rowcount,
0,
glob_rowcount // 20,
noSetupRequired,
parameterize_btree_index_unique_wide,
execute_n_times)
run_one_bitmap_scan_test(conn,
"Btree Scan Test; NDV=100; selectivity_pct=parameter_value; count(*)",
0,
5,
noSetupRequired,
parameterize_btree_index_100_narrow,
execute_n_times)
# all full table scan, no crossover
run_one_bitmap_scan_test(conn,
"Btree Scan Test; NDV=100; selectivity_pct=parameter_value; max(txt)",
0,
5,
noSetupRequired,
parameterize_btree_index_100_wide,
execute_n_times)
run_one_bitmap_scan_test(conn,
"Btree Scan Test; NDV=10000; selectivity_pct=0.01*parameter_value; count(*)",
0,
500,
noSetupRequired,
parameterize_btree_index_10000_narrow,
execute_n_times)
run_one_bitmap_scan_test(conn,
"Btree Scan Test; NDV=10000; selectivity_pct=0.01*parameter_value; max(txt)",
0,
1000,
noSetupRequired,
parameterize_btree_index_10000_wide,
execute_n_times)
run_one_bitmap_scan_test(conn,
"Btree Scan Test; multi-range; NDV=10000; selectivity_pct=0.01*parameter_value; count(*)",
0,
1000,
noSetupRequired,
parameterize_btree_index_10000_multi_narrow,
execute_n_times)
run_one_bitmap_scan_test(conn,
"Btree Scan Test; multi-range; NDV=10000; selectivity_pct=0.01*parameter_value; max(txt)",
0,
1000,
noSetupRequired,
parameterize_btree_index_10000_multi_wide,
execute_n_times)
run_one_bitmap_scan_test(conn,
"Btree Scan Test; in-list; selectivity_pct=100*parameter_value/%d; count(*)" % glob_rowcount,
0,
5000, # length of IN list
noSetupRequired,
parameterize_btree_unique_in_narrow,
execute_n_times)
run_one_bitmap_scan_test(conn,
"Btree Scan Test; in-list; selectivity_pct=100*parameter_value/%d; max(txt)" % glob_rowcount,
0,
3000, # length of IN list
noSetupRequired,
parameterize_btree_unique_in_wide,
execute_n_times)
def run_index_join_tests(conn, execute_n_times):
run_one_bitmap_join_test(conn,
"Bitmap Join Test, NDV=10000, selectivity_pct=0.01*parameter_value, count(*)",
"Bitmap Join Test; NDV=10000; selectivity_pct=0.01*parameter_value; count(*)",
0,
900,
400,
noSetupRequired,
parameterize_bitmap_join_narrow,
execute_n_times)
run_one_bitmap_join_test(conn,
"Bitmap Join Test, NDV=10000, selectivity_pct=0.01*parameter_value, max(txt)",
"Bitmap Join Test; NDV=10000; selectivity_pct=0.01*parameter_value; max(txt)",
0,
900,
300,
noSetupRequired,
parameterize_bitmap_join_wide,
execute_n_times)
run_one_bitmap_join_test(conn,
"Bitmap Join BFV Test, Large Data, parameter = num rows inserted",
10000, # num of rows inserted
"Btree Join Test; NDV=10000; selectivity_pct=0.01*parameter_value; count(*)",
0,
500,
noSetupRequired,
parameterize_btree_join_narrow,
execute_n_times)
run_one_bitmap_join_test(conn,
"Btree Join Test; NDV=10000; selectivity_pct=0.01*parameter_value; max(txt)",
0,
400,
noSetupRequired,
parameterize_btree_join_wide,
execute_n_times)
def run_bfv_join_tests(conn, execute_n_times):
run_one_bitmap_join_test(conn,
"Bitmap Join BFV Test; Large Data; parameter = num rows inserted",
10000, # num of rows inserted
900000,
parameterize_insert_join_bfv,
parameterize_bitmap_join_bfv,
execute_n_times)
# common parts of all test suites, create tables, run tests, drop objects
# -----------------------------------------------------------------------------
# create the table(s), as regular or AO table, and insert num_rows into the main table
def createDB(conn, use_ao, num_rows):
global glob_appendonly
create_options = ""
if use_ao:
create_options = _with_appendonly
glob_appendonly = True
create_cal_table_stmt = _create_cal_table % create_options
create_bfv_table = _create_bfv_table % create_options
create_ndv_table = _create_ndv_table % create_options
insert_into_temp_stmt = _insert_into_temp % (1,num_rows)
insert_into_other_stmt = _insert_into_other_tables % (1,10000)
insert_into_temp_stmt = _insert_into_temp % num_rows
insert_into_other_stmt = _insert_into_other_tables % (1, glob_dim_table_rows)
execute_sql(conn, _drop_tables)
execute_sql(conn, create_cal_table_stmt)
execute_sql(conn, create_bfv_table)
......@@ -1145,27 +1386,103 @@ def createDB(conn, use_ao, num_rows):
execute_sql_arr(conn, _create_index_arr)
execute_sql_arr(conn, _create_bfv_index_arr)
execute_sql_arr(conn, _create_ndv_index_arr)
if use_ao:
execute_sql_arr(conn, _create_btree_indexes_ao_arr)
execute_sql_arr(conn, _create_btree_indexes_arr)
execute_sql(conn, _analyze_table)
commit_db(conn)
def dropDB(conn):
execute_sql(conn, _drop_tables)
execute_sql(conn, _drop_tables)
# smooth statistics for a single integer column uniformly distributed between 1 and row_count, with a given row count and NDV
#
# For NDVs of 100 or less, list all of them
# For NDVs of more than 100, generate a histogram with 100 buckets
# Set the correlation to 0 for all columns, since the data was shuffled randomly
def smoothStatisticsForOneCol(conn, table_name, attnum, row_count, ndv):
# calculate stadistinct value and ndv, if specified as -1
if ndv == -1:
stadistinct = -1
ndv = row_count
else:
stadistinct = ndv
# correlation to physical row ordering is 0 for all columns
corr = 0.0
# stakind: 1 is a list of most common values and frequencies, 2 is a histogram with range buckets
stakind = 1
# arrays for stanumbers and stavalues
stanumbers = []
stavalues = []
stanumbers_txt = "NULL"
num_values = min(ndv, 100)
if ndv <= 100:
# produce "ndv" MCVs, each with the same frequency
for i in range(1,num_values+1):
stanumbers.append(str(float(1)/ndv))
stavalues.append(str(i))
stanumbers_txt = "'{ " + ", ".join(stanumbers) + " }'::float[]"
else:
# produce a uniformly distributed histogram with 100 buckets (101 boundaries)
stakind = 2
stavalues.append(str(1))
for j in range(1,num_values+1):
stavalues.append(str((j*ndv) // num_values))
stavalues_txt = "'{ " + ", ".join(stavalues) + " }'::int[]"
execute_sql(conn, _update_pg_stats % (stadistinct, stakind, stanumbers_txt, stavalues_txt, corr, table_name, attnum))
# ensure that we have perfect histogram statistics on the relevant columns
def smoothStatistics(conn):
def smoothStatistics(conn, num_fact_table_rows):
prev_table_name = ""
if glob_gpdb_major_version > 5:
execute_sql(conn, _allow_system_mods)
else:
execute_sql(conn, _allow_system_mods_v5)
execute_sql_arr(conn, _fix_statistics)
for tup in _stats_cols_to_fix:
# note that col_name is just for human readability
(table_name, col_name, attnum, ndv, table_rows) = tup
if table_rows == -1:
table_rows = num_fact_table_rows
smoothStatisticsForOneCol(conn, table_name, attnum, table_rows, ndv)
if prev_table_name != table_name:
prev_table_name = table_name
execute_sql(conn, _update_pg_class % (table_rows, table_name))
commit_db(conn)
def inspectExistingTables(conn):
global glob_rowcount
global glob_appendonly
sqlStr = "SELECT count(*) from cal_txtest"
curs = dbconn.query(conn, sqlStr)
rows = curs.fetchall()
for row in rows:
glob_rowcount = row[0]
log_output("Row count of existing fact table is %d" % glob_rowcount)
sqlStr = "SELECT lower(unnest(reloptions)) from pg_class where relname = 'cal_txtest'"
curs = dbconn.query(conn, sqlStr)
rows = curs.fetchall()
for row in rows:
if re.search("appendonly", row[0]):
glob_appendonly = True
if glob_appendonly:
log_output("Existing fact table is append-only")
else:
log_output("Existing fact table is not an append-only table")
def main():
global glob_verbose
global glob_log_file
global glob_rowcount
args, parser = parseargs()
if args.logFile != "":
glob_log_file = open(args.logFile, "wt", 1)
......@@ -1175,23 +1492,40 @@ def main():
conn = connect(args.host, args.port, args.dbName)
select_version(conn)
if args.create:
glob_rowcount = args.numRows
createDB(conn, args.appendOnly, args.numRows)
smoothStatistics(conn)
smoothStatistics(conn, args.numRows)
else:
inspectExistingTables(conn)
for test_unit in args.tests:
if test_unit == "all":
run_bitmap_index_scan_tests(conn, args.execute)
run_bitmap_index_join_tests(conn, args.execute)
if glob_appendonly:
# the btree tests are for bitmap scans on AO tables using btree indexes
run_btree_ao_index_scan_tests(conn, args.execute)
run_index_join_tests(conn, args.execute)
# skip the long-running bitmap_ndv_scan_tests and bfv_join_tests
elif test_unit == "bitmap_scan_tests":
run_bitmap_index_scan_tests(conn, args.execute)
elif test_unit == "bitmap_join_tests":
run_bitmap_index_join_tests(conn, args.execute)
elif test_unit == "bitmap_ndv_scan_tests":
run_bitmap_ndv_scan_tests(conn, args.execute)
elif test_unit == "btree_ao_scan_tests":
run_btree_ao_index_scan_tests(conn, args.execute)
elif test_unit == "index_join_tests":
run_index_join_tests(conn, args.execute)
elif test_unit == "bfv_join_tests":
run_bfv_join_tests(conn, args.execute)
elif test_unit == "none":
print("Skipping tests")
if args.drop:
dropDB(conn)
conn.close()
if glob_log_file != None:
glob_log_file.close()
if __name__ == "__main__":
main()
......@@ -87,7 +87,7 @@ CTypeModifierTest:
TypeModifierColumn TypeModifierCast TypeModifierConst TypeModifierDoubleMappableConst TypeModifierArrayRef;
CIndexScanTest:
BTreeIndex-Against-InList BTreeIndex-Against-ScalarSubquery
BTreeIndex-Against-InList BTreeIndex-Against-InListLarge BTreeIndex-Against-ScalarSubquery
IndexScan-AOTable IndexScan-DroppedColumns IndexScan-BoolTrue IndexScan-BoolFalse
IndexScan-Relabel IndexGet-OuterRefs LogicalIndexGetDroppedCols NewBtreeIndexScanCost
IndexScan-ORPredsNonPart IndexScan-ORPredsAOPart IndexScan-AndedIn;
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册