1. 05 1月, 2017 6 次提交
  2. 24 12月, 2016 2 次提交
  3. 20 12月, 2016 2 次提交
  4. 15 12月, 2016 2 次提交
  5. 14 12月, 2016 2 次提交
  6. 08 12月, 2016 4 次提交
    • D
    • D
      No-Op Motion requires distribution of relational child [#134885333] · cd6f9edf
      Before no-op motions were introduced, a hash-redistribute motion was
      intended to really move tuples around, and it was justifiable that they
      required 'ANY' distribution, and hence they shared optimization contexts
      when everything else were identical.
      A no-op motion, however, was intended to _not_ move tuples. i.e. A no-op
      motion to hash-redistribute by column `b` on top of a relation that's
      hash-distributed on column `a` is not only undesirable, it's completely
      unintended. To rule out such plans, we made hash-redistribute motions a
      bit more intelligent when their distribution specifications say `no-op`:
      they should require the relation under them to be distributed exactly as
      the motions (as opposed to `ANY`).
      A nice side-effect of this change is, no-op motions that distribute on
      columns that are not covered by the output columns of the child group
      will no longer match any group expressions. This is a bit nuanced, but a
      minimal query to reproduce looks something like:
      
      ```
      CREATE TABLE foo (a int, b int) DISTRIBUTED BY (a);
      
      EXPLAIN SELECT b, a FROM foo UNION ALL SELECT b, a FROM foo INTERSECT ALL SELECT b, a FROM foo;
      ```
      
      And a (wrong) plan before this change looked like the following:
      
      ```
      Physical plan:
      +--CPhysicalMotionGather(master)
         +--CPhysicalParallelUnionAll
            |--CPhysicalMotionHashDistribute HASHED NO-OP: "a" (0)
            |  +--CPhysicalTableScan "foo" ("foo")
            +--CPhysicalMotionHashDistribute HASHED NO-OP: "gp_segment_id" (17)
               +--CPhysicalLeftSemiHashJoin
                  |--CPhysicalSequenceProject (HASHED: "b" (10) "a" (9)
                  |  |--CPhysicalSort  ( (97,1.0), "b" (10), NULLsLast ) ( (97,1.0), "a" (9), NULLsLast )
                  |  |  +--CPhysicalTableScan "foo" ("foo")
                  |  +--CScalarProjectList
                  |     +--CScalarProjectElement "row_number" (27)
                  |        +--CScalarWindowFunc (row_number , Agg: false , Distinct: false)
                  |--CPhysicalSequenceProject (HASHED: "b" (19) "a" (18)
                  |  |--CPhysicalSort  ( (97,1.0), "b" (19), NULLsLast ) ( (97,1.0), "a" (18), NULLsLast )
                  |  |  +--CPhysicalTableScan "foo" ("foo")
                  |  +--CScalarProjectList
                  |     +--CScalarProjectElement "row_number" (28)
                  |        +--CScalarWindowFunc (row_number , Agg: false , Distinct: false)
                  +--CScalarBoolOp (EboolopAnd)
                     |--CScalarBoolOp (EboolopNot)
                     |  +--CScalarIsDistinctFrom (=)
                     |     |--CScalarIdent "b" (10)
                     |     +--CScalarIdent "b" (19)
                     |--CScalarBoolOp (EboolopNot)
                     |  +--CScalarIsDistinctFrom (=)
                     |     |--CScalarIdent "a" (9)
                     |     +--CScalarIdent "a" (18)
                     +--CScalarBoolOp (EboolopNot)
                        +--CScalarIsDistinctFrom (=)
                           |--CScalarIdent "row_number" (27)
                           +--CScalarIdent "row_number" (28)
      ```
      
      Where the memo group of the no-op motion on `gp_segment_id (17)` was
      like:
      
      ```
      Group 3 (#GExprs: 14):
        0: CLogicalIntersectAll Output: ("b" (10), "a" (9)), Input: [("b" (10), "a" (9)), ("b" (19), "a" (18))] [ 1 2 ]
        1: CLogicalLeftSemiJoin [ 8 11 24 ]
        2: CLogicalGbAggDeduplicate( Global ) Grp Cols: ["a" (9), "b" (10), "ctid" (11), "gp_segment_id" (17), "row_number" (27)], Minimal Grp Cols: [], Join Child Keys: ["ctid" (11), "gp_segment_id" (17)], Generates Duplicates :[ 0 ]  [ 25 26 ]
        3: CLogicalGbAggDeduplicate( Global ) Grp Cols: ["a" (9), "b" (10), "ctid" (11), "gp_segment_id" (17), "row_number" (27)], Minimal Grp Cols: ["a" (9), "b" (10), "ctid" (11), "gp_segment_id" (17), "row_number" (27)], Join Child Keys: ["ctid" (11), "gp_segment_id" (17)], Generates Duplicates :[ 1 ]  [ 27 26 ]
        4: CPhysicalStreamAggDeduplicate( Global ) Grp Cols: ["a" (9), "b" (10), "ctid" (11), "gp_segment_id" (17), "row_number" (27)], Key Cols:["ctid" (11), "gp_segment_id" (17)], Generates Duplicates :[ 1 ]  (High) [ 27 26 ]
          Cost Ctxts:
            main ctxt (stage 0)1.1, child ctxts:[0], rows:1.000000 (group), cost: 862.001458
            main ctxt (stage 0)3.1, child ctxts:[0], rows:1.000000 (group), cost: 862.001458
        5: CPhysicalStreamAggDeduplicate( Global ) Grp Cols: ["a" (9), "b" (10), "ctid" (11), "gp_segment_id" (17), "row_number" (27)], Key Cols:["ctid" (11), "gp_segment_id" (17)], Generates Duplicates :[ 0 ]  [ 25 26 ]
          Cost Ctxts:
            main ctxt (stage 0)1.1, child ctxts:[4], rows:1.000000 (group), cost: 862.001429
            main ctxt (stage 0)3.1, child ctxts:[4], rows:1.000000 (group), cost: 862.001429
        6: CPhysicalLeftSemiHashJoin (High) [ 8 11 24 ]
          Cost Ctxts:
            main ctxt (stage 0)1.0, child ctxts:[5, 6], rows:1.000000 (group), cost: 862.001394
            main ctxt (stage 0)1.1, child ctxts:[3, 5], rows:1.000000 (group), cost: 862.001294
            main ctxt (stage 0)1.2, child ctxts:[4, 4], rows:1.000000 (group), cost: 862.001375
            main ctxt (stage 0)1.3, child ctxts:[3, 3], rows:1.000000 (group), cost: 862.001294
            main ctxt (stage 0)1.5, child ctxts:[2, 2], rows:1.000000 (group), cost: 862.002153
            main ctxt (stage 0)1.6, child ctxts:[0, 0], rows:1.000000 (group), cost: 862.001652
            main ctxt (stage 0)3.0, child ctxts:[5, 6], rows:1.000000 (group), cost: 862.001394
            main ctxt (stage 0)3.1, child ctxts:[3, 5], rows:1.000000 (group), cost: 862.001294
            main ctxt (stage 0)3.2, child ctxts:[4, 4], rows:1.000000 (group), cost: 862.001375
            main ctxt (stage 0)3.3, child ctxts:[3, 3], rows:1.000000 (group), cost: 862.001294
            main ctxt (stage 0)3.5, child ctxts:[2, 2], rows:1.000000 (group), cost: 862.002153
        7: CPhysicalLeftSemiNLJoin [ 8 11 24 ]
          Cost Ctxts:
            main ctxt (stage 0)2.1, cost lower bound: 1290.000233	 PRUNED
            main ctxt (stage 0)1.1, cost lower bound: 1290.000233	 PRUNED
            main ctxt (stage 0)3.1, cost lower bound: 1290.000233	 PRUNED
            main ctxt (stage 0)0.1, cost lower bound: 1290.000233	 PRUNED
        8: CPhysicalMotionHashDistribute HASHED NO-OP: [ +--CScalarIdent "a" (9)
       , nulls colocated ] [ 3 ]
          Cost Ctxts:
            main ctxt (stage 0)0.0, child ctxts:[1], rows:1.000000 (group), cost: 862.001294
        9: CPhysicalMotionHashDistribute HASHED NO-OP: [ +--CScalarIdent "row_number" (27)   origin: [Grp:20, GrpExpr:0]
       , nulls colocated ] [ 3 ]
          Cost Ctxts:
            main ctxt (stage 0)0.0, child ctxts:[1], rows:1.000000 (group), cost: 862.001294
        10: CPhysicalMotionHashDistribute HASHED NO-OP: [ +--CScalarIdent "b" (10)   origin: [Grp:12, GrpExpr:0]
       , nulls colocated ] [ 3 ]
          Cost Ctxts:
            main ctxt (stage 0)0.0, child ctxts:[1], rows:1.000000 (group), cost: 862.001294
        11: CPhysicalMotionHashDistribute HASHED NO-OP: [ +--CScalarIdent "gp_segment_id" (17)
       , nulls colocated ] [ 3 ]
          Cost Ctxts:
            main ctxt (stage 0)0.0, child ctxts:[1], rows:1.000000 (group), cost: 862.001294
        12: CPhysicalMotionHashDistribute STRICT HASHED: [ +--CScalarIdent "b" (10)
       +--CScalarIdent "a" (9)
       , nulls colocated ] [ 3 ]
          Cost Ctxts:
            main ctxt (stage 0)2.0, child ctxts:[1], rows:1.000000 (group), cost: 862.001315
        13: CPhysicalMotionRandom [ 3 ]
          Cost Ctxts:
        Grp OptCtxts:
          2 (stage 0): (req cols: ["a" (9), "b" (10)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [STRICT HASHED: [ +--CScalarIdent "b" (10)
       +--CScalarIdent "a" (9)
       , nulls colocated ] match: exact], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:12
          1 (stage 0): (req cols: ["a" (9), "b" (10)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [ANY  EOperatorId: 122  match: satisfy], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:6
          3 (stage 0): (req cols: ["a" (9), "b" (10)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [NON-SINGLETON  (NON-REPLICATED) match: satisfy], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:6
          0 (stage 0): (req cols: ["a" (9), "b" (10)], req CTEs: [], req order: [<empty> match: satisfy ], req dist: [HASHED NO-OP: [ +--CScalarIdent "b" (10)
       , nulls colocated ] match: exact], req rewind: [NON-REWINDABLE match: satisfy], req partition propagation: [Filters: [] match: satisfy ]) => Best Expr:11
      ```
      
      After this change, we got our expected plan:
      
      ```
      +--CPhysicalMotionGather(master)
         +--CPhysicalParallelUnionAll
            |--CPhysicalMotionHashDistribute HASHED NO-OP: [ +--CScalarIdent "a" (0)
       , nulls colocated ]
            |  +--CPhysicalTableScan "foo" ("foo")
            +--CPhysicalMotionHashDistribute HASHED NO-OP: [ +--CScalarIdent "a" (9) , nulls colocated ]
               +--CPhysicalLeftSemiHashJoin
                  |--CPhysicalSequenceProject (HASHED: [ +--CScalarIdent "b" (10) +--CScalarIdent "a" (9)
       , nulls colocated ], [<empty>], [EMPTY FRAME])
                  |  |--CPhysicalSort  ( (97,1.0), "b" (10), NULLsLast ) ( (97,1.0), "a" (9), NULLsLast )
                  |  |  +--CPhysicalTableScan "foo" ("foo")
                  |  +--CScalarProjectList
                  |     +--CScalarProjectElement "row_number" (27)
                  |        +--CScalarWindowFunc (row_number , Agg: false , Distinct: false)
                  |--CPhysicalSequenceProject (HASHED: [ +--CScalarIdent "b" (19) +--CScalarIdent "a" (18)
       , nulls colocated ], [<empty>], [EMPTY FRAME])
                  |  |--CPhysicalSort  ( (97,1.0), "b" (19), NULLsLast ) ( (97,1.0), "a" (18), NULLsLast )
                  |  |  +--CPhysicalTableScan "foo" ("foo")
                  |  +--CScalarProjectList
                  |     +--CScalarProjectElement "row_number" (28)
                  |        +--CScalarWindowFunc (row_number , Agg: false , Distinct: false)
                  +--CScalarBoolOp (EboolopAnd)
                     |--CScalarBoolOp (EboolopNot)
                     |  +--CScalarIsDistinctFrom (=)
                     |     |--CScalarIdent "b" (10)
                     |     +--CScalarIdent "b" (19)
                     |--CScalarBoolOp (EboolopNot)
                     |  +--CScalarIsDistinctFrom (=)
                     |     |--CScalarIdent "a" (9)
                     |     +--CScalarIdent "a" (18)
                     +--CScalarBoolOp (EboolopNot)
                        +--CScalarIsDistinctFrom (=)
                           |--CScalarIdent "row_number" (27)
                           +--CScalarIdent "row_number" (28)
      ```
      cd6f9edf
    • D
      843dc872
    • D
      Whitespaces · 2da9f1a2
      Dhanashree Kashid, Jesse Zhang and Omer Arap 提交于
      [#134885333]
      2da9f1a2
  7. 30 11月, 2016 3 次提交
  8. 29 11月, 2016 3 次提交
    • X
      Bump GPORCA version to v1.693. · 455364f1
      Xin Zhang 提交于
      455364f1
    • J
      Focus on CentOS for now · 416c5f2b
      Jesse Zhang 提交于
      Also, our build process should be agnostic to the distro.
      [ci skip]
      416c5f2b
    • X
      Fix implied predicates under limit subquery [#129871531] (#125) · b4f17ca8
      Xin Zhang 提交于
      Under situation where there is a subquery under limit and also the
      columns used by the predicates inside the subquery are also referenced
      outside the subquery, then `InferPredicates` under `PexprPreprocess` will
      stop producing predicates under limit.
      
      For example:
      ```
      explain select 1 from (select * from foo where a = 1 and b = a limit 1) x;
      
                                                 QUERY PLAN
      ------------------------------------------------------------------------------------------------
       Result  (cost=0.00..431.00 rows=1 width=4)
         ->  Result  (cost=0.00..431.00 rows=1 width=1)
               Filter: a = 1 AND b = 1
               ->  Limit  (cost=0.00..431.00 rows=1 width=8)
                     ->  Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..431.00 rows=1 width=8)
                           ->  Table Scan on foo  (cost=0.00..431.00 rows=1 width=8)
                                 Filter: a = 1 AND a = b
       Settings:  optimizer=on
       Optimizer status: PQO version 1.687
      (9 rows)
      ```
      
      As you can see the example above, we expect to also produce `b=1` on the `Table Scan on foo`.
      
      This is due to the reason of `InferPredicates` will NOT generate duplicated predicates if parent
      operators already generate the predicates.
      
      The basic assumption is that, all the predicates should be generated as high as possible in the
      query plan, and later on predicate pushdown will move those predicates in proper location.
      
      However, such assumption is broken when pushing predicates over limit, because it's NOT semantically
      correct to push predicates outside limit to inside limit. (It's totally fine to duplicate predicates
      outside limit).
      
      The fix is actually checking the operator when deriving predicates out of constraints. If we see a
      limit operator, then always generate all the predicates based on its constraint, since there won't
      be any predicates pushed down through limit.
      Signed-off-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
      b4f17ca8
  9. 24 11月, 2016 11 次提交
  10. 23 11月, 2016 5 次提交