-
由 Bhuvnesh Chaudhary 提交于
Update the tests and bump orca version corresponding to the below fix in ORCA. If the Scalar Agg node has child which has volatile function and has universal distribution for relational child, instead of deriving the spec as that of the child i.e Universal Spec, we should derive the distribution spec as singleton. Deriving singleton distribution spec will ensure that the tree is executed on one node and the results will be distributed to the remaining segments. Consider the below setup: create table f (a text); explain insert into f select sum(random()) from (select 1); Physical plan: +--CPhysicalDML (Insert, "foo"), Source Columns: ["f1" (2)], Action: ("ColRef_0004" (4)) rows:1 width:1 rebinds:1 cost:0.016658 origin: [Grp:12, GrpExpr:2] +--CPhysicalComputeScalar rows:1 width:1 rebinds:1 cost:0.001033 origin: [Grp:23, GrpExpr:3] |--CPhysicalMotionHashDistribute HASHED: [ +--CScalarIdent "f1" (2) , nulls colocated, duplicate sensitive ] rows:1 width:1 rebinds:1 cost:0.001029 origin: [Grp:11, GrpExpr:2] | +--CPhysicalComputeScalar rows:1 width:1 rebinds:1 cost:0.001008 origin: [Grp:11, GrpExpr:1] | |--CPhysicalScalarAgg( Global ) Grp Cols: [], Minimal Grp Cols:[], Generates Duplicates :[ 0 ] rows:1 width:8 rebinds:1 cost:0.001000 origin: [Grp:6, GrpExpr:3] | | |--CPhysicalTVF rows:1000 width:1 rebinds:1 cost:0.001000 origin: [Grp:1, GrpExpr:1] | | | |--CScalarConst (1) origin: [Grp:0, GrpExpr:0] | | | +--CScalarConst (1) origin: [Grp:0, GrpExpr:0] | | +--CScalarProjectList origin: [Grp:5, GrpExpr:0] | | +--CScalarProjectElement "sum" (1) origin: [Grp:4, GrpExpr:0] | | +--CScalarAggFunc (sum , Distinct: false , Aggregate Stage: Global) origin: [Grp:3, GrpExpr:0] | | +--CScalarFunc (random) origin: [Grp:2, GrpExpr:0] | +--CScalarProjectList origin: [Grp:10, GrpExpr:0] | +--CScalarProjectElement "f1" (2) origin: [Grp:9, GrpExpr:0] | +--CScalarCoerceViaIO origin: [Grp:8, GrpExpr:0] | +--CScalarIdent "sum" (1) origin: [Grp:7, GrpExpr:0] +--CScalarProjectList origin: [Grp:22, GrpExpr:0] +--CScalarProjectElement "ColRef_0004" (4) origin: [Grp:21, GrpExpr:0] +--CScalarConst (1) origin: [Grp:0, GrpExpr:0] ", QUERY PLAN ---------------------------------------------------------------------------------------------------- Insert (cost=0.00..0.02 rows=1 width=8) -> Result (cost=0.00..0.00 rows=1 width=12) -> Result (cost=0.00..0.00 rows=1 width=8) -> Result (cost=0.00..0.00 rows=1 width=8) -> Aggregate (cost=0.00..0.00 rows=1 width=8) -> Function Scan on generate_series (cost=0.00..0.00 rows=334 width=1) Above CPhysicalMotionHashDistribute has a universal child (i.e CPhysicalTVF), and translator will convert it to a Result Node with Hash Filter. The hash filter is build on the projected column in this query i.e sum(random()).. Random being a volatile function, will cause the resulting sum to be non-deterministic i.e volatile in this case. So the hash filter may not appropriately filter out the segment and the insert may result in inserting non-deterministic number of rows in the table (with the maximum being equal to the number of segments). So, we should derive singleton distribution if the relation child of the scalar agg node delivers universal spec and the project list has volatile functions. Resulting plan will be: pivotal=# explain insert into foo select sum(random()) from (select 1) f; QUERY PLAN --------------------------------------------------------------------------------- Insert (cost=0.00..0.02 rows=1 width=8) -> Result (cost=0.00..0.00 rows=1 width=12) -> Redistribute Motion 1:3 (slice1) (cost=0.00..0.00 rows=1 width=8) Hash Key: (((sum(random())))::text) -> Result (cost=0.00..0.00 rows=1 width=8) -> Aggregate (cost=0.00..0.00 rows=1 width=8) -> Result (cost=0.00..0.00 rows=1 width=1)
19f36828