• V
    Convert Non-correlated EXISTS subquery to a LIMIT 1 AND a JOIN · e04ae39d
    Venkatesh Raghavan 提交于
    Enable GPORCA to generate better plans for non-correlated exists subquery in the WHERE clause
    
    Consider the following exists subquery, `(select * from bar)`. GPORCA generates an elaborate count based implementation of this subquery. If bar is a fact table, the count is going to be expensive.
    
    ```
    vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar);
                                                        QUERY PLAN
    ------------------------------------------------------------------------------------------------------------------
     Gather Motion 3:1  (slice3; segments: 3)  (cost=0.00..1368262.79 rows=400324 width=8)
       ->  Nested Loop  (cost=0.00..1368250.86 rows=133442 width=8)
             Join Filter: true
             ->  Table Scan on foo  (cost=0.00..461.91 rows=133442 width=8)
                   Filter: a = b
             ->  Materialize  (cost=0.00..438.57 rows=1 width=1)
                   ->  Broadcast Motion 1:3  (slice2)  (cost=0.00..438.57 rows=3 width=1)
                         ->  Result  (cost=0.00..438.57 rows=1 width=1)
                               Filter: (count((count()))) > 0::bigint
                               ->  Aggregate  (cost=0.00..438.57 rows=1 width=8)
                                     ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..438.57 rows=1 width=8)
                                           ->  Aggregate  (cost=0.00..438.57 rows=1 width=8)
                                                 ->  Table Scan on bar  (cost=0.00..437.95 rows=332395 width=1)
     Optimizer status: PQO version 2.35.1
    (14 rows)
    ```
    Planner on the other hand uses LIMIT as shown in the INIT plan.
    
    ```
    vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar);
                                               QUERY PLAN
    ------------------------------------------------------------------------------------------------
     Gather Motion 3:1  (slice2; segments: 3)  (cost=0.03..13611.14 rows=1001 width=8)
       ->  Result  (cost=0.03..13611.14 rows=334 width=8)
             One-Time Filter: $0
             InitPlan  (slice3)
               ->  Limit  (cost=0.00..0.03 rows=1 width=0)
                     ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..0.03 rows=1 width=0)
                           ->  Limit  (cost=0.00..0.01 rows=1 width=0)
                                 ->  Seq Scan on bar  (cost=0.00..11072.84 rows=332395 width=0)
             ->  Seq Scan on foo  (cost=0.00..13611.11 rows=334 width=8)
                   Filter: a = b
     Settings:  optimizer=off
     Optimizer status: legacy query optimizer
    (12 rows)
    ```
    
    While GPORCA doesnot support init-plan, we can nevertheless generate a better plan by using LIMIT instead of count. After this PR, GPORCA will generate the following plan with LIMIT clause.
    
    ```
    vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar);
                                                     QUERY PLAN
    ------------------------------------------------------------------------------------------------------------
     Gather Motion 3:1  (slice3; segments: 3)  (cost=0.00..1368262.73 rows=400324 width=8)
       ->  Nested Loop EXISTS Join  (cost=0.00..1368250.80 rows=133442 width=8)
             Join Filter: true
             ->  Table Scan on foo  (cost=0.00..461.91 rows=133442 width=8)
                   Filter: a = b
             ->  Materialize  (cost=0.00..438.57 rows=1 width=1)
                   ->  Broadcast Motion 1:3  (slice2)  (cost=0.00..438.57 rows=3 width=1)
                         ->  Limit  (cost=0.00..438.57 rows=1 width=1)
                               ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..438.57 rows=1 width=1)
                                     ->  Limit  (cost=0.00..438.57 rows=1 width=1)
                                           ->  Table Scan on bar  (cost=0.00..437.95 rows=332395 width=1)
     Optimizer status: PQO version 2.35.1
    (12 rows)
    ```
    e04ae39d
CMakeLists.txt 5.3 KB