Convert Non-correlated EXISTS subquery to a LIMIT 1 AND a JOIN
Enable GPORCA to generate better plans for non-correlated exists subquery in the WHERE clause Consider the following exists subquery, `(select * from bar)`. GPORCA generates an elaborate count based implementation of this subquery. If bar is a fact table, the count is going to be expensive. ``` vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar); QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Gather Motion 3:1 (slice3; segments: 3) (cost=0.00..1368262.79 rows=400324 width=8) -> Nested Loop (cost=0.00..1368250.86 rows=133442 width=8) Join Filter: true -> Table Scan on foo (cost=0.00..461.91 rows=133442 width=8) Filter: a = b -> Materialize (cost=0.00..438.57 rows=1 width=1) -> Broadcast Motion 1:3 (slice2) (cost=0.00..438.57 rows=3 width=1) -> Result (cost=0.00..438.57 rows=1 width=1) Filter: (count((count()))) > 0::bigint -> Aggregate (cost=0.00..438.57 rows=1 width=8) -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..438.57 rows=1 width=8) -> Aggregate (cost=0.00..438.57 rows=1 width=8) -> Table Scan on bar (cost=0.00..437.95 rows=332395 width=1) Optimizer status: PQO version 2.35.1 (14 rows) ``` Planner on the other hand uses LIMIT as shown in the INIT plan. ``` vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar); QUERY PLAN ------------------------------------------------------------------------------------------------ Gather Motion 3:1 (slice2; segments: 3) (cost=0.03..13611.14 rows=1001 width=8) -> Result (cost=0.03..13611.14 rows=334 width=8) One-Time Filter: $0 InitPlan (slice3) -> Limit (cost=0.00..0.03 rows=1 width=0) -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..0.03 rows=1 width=0) -> Limit (cost=0.00..0.01 rows=1 width=0) -> Seq Scan on bar (cost=0.00..11072.84 rows=332395 width=0) -> Seq Scan on foo (cost=0.00..13611.11 rows=334 width=8) Filter: a = b Settings: optimizer=off Optimizer status: legacy query optimizer (12 rows) ``` While GPORCA doesnot support init-plan, we can nevertheless generate a better plan by using LIMIT instead of count. After this PR, GPORCA will generate the following plan with LIMIT clause. ``` vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar); QUERY PLAN ------------------------------------------------------------------------------------------------------------ Gather Motion 3:1 (slice3; segments: 3) (cost=0.00..1368262.73 rows=400324 width=8) -> Nested Loop EXISTS Join (cost=0.00..1368250.80 rows=133442 width=8) Join Filter: true -> Table Scan on foo (cost=0.00..461.91 rows=133442 width=8) Filter: a = b -> Materialize (cost=0.00..438.57 rows=1 width=1) -> Broadcast Motion 1:3 (slice2) (cost=0.00..438.57 rows=3 width=1) -> Limit (cost=0.00..438.57 rows=1 width=1) -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..438.57 rows=1 width=1) -> Limit (cost=0.00..438.57 rows=1 width=1) -> Table Scan on bar (cost=0.00..437.95 rows=332395 width=1) Optimizer status: PQO version 2.35.1 (12 rows) ```
Showing
此差异已折叠。
此差异已折叠。
想要评论请 注册 或 登录