提交 · 2e34370acc78ab597b6218826174a2c8e61b0e80 · Greenplum / Gpdb

28 4月, 2020 2 次提交

Add an additional orca answer file for test gangsize. · 2e34370a

由 Paul Guo 提交于 4月 28, 2020

commit a0a5b4d5 adds additional one for
postgres planner.  We need to add the additional one for orca also.

2e34370a

Bring back load balancing of single segment index for gang type GANGTYPE_SINGLETON_READER. (#9995) · a0a5b4d5

由 Paul Guo 提交于 4月 28, 2020

It was removed during the slice refactor work. I found that when running test
isolation2/terminate_in_gang_creation. This feature should be quite useful in
parallel OLTP load.

With this code change, test isolation2/terminate_in_gang_creation needs to be
modified to be deterministic. Also added tests in regression/dispatch for this
code change. Changes in other tests are not relevant (just for cleanup).
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>

a0a5b4d5

11 1月, 2020 1 次提交

Refactor the late "parallelization" stages of the planner. · 93abe741

由 Heikki Linnakangas 提交于 1月 10, 2020

* Build a preliminary "planner slice table" at the end of planning, and
  attach it to the PlannedStmt. Executor startup turns that into the
  final executor slice table. This replaces the step where executor
  startup scanned the whole Plan tree build the slice table.

* Now that the executor startup gets a pre-built planner slice table, it
  doesn't need the Flow structures for building the slice table anymore.
  Also refactor the few other remaining places in nodeMotion.c and
  nodeResult.c that accessed the Flows to use the information from the
  slice table instead. The executor no longer looks at the Flows at all,
  so we don't need to include them in the serialized plan tree anymore.
  ORCA translator doesn't need to build Flow structures anymore either.
  Instead, it now builds the planner slice table like the Postgres planner
  does.

* During createplan.c processing, keep track of the current "slice", and
  attach direct dispatch and other per-slice information to the PlanSlice
  struct directly, instead of carrying it in the Flow structs. This
  renders the Flows mostly unused in the planner too, but there is still
  one thing we use the Flows for: to figure out when we need to add a
  Motion on top of a SubPlan's plan tree, to make the subplan's result
  available in the slice where the SubPlan is evaluated. There's a "sender
  slice" struct attached to a Motion during create_plan() processing to
  represent the sending slice. But the slice ID is not assigned at that
  stage yet. Motion / Slice IDs are assigned later, when the slice table
  is created.

* Only set 'flow' on the topmost Plan, and in the child of a Motion.

* Remove unused initplans and subplans near the end of planning, after
  set_plan_references(), but before building the planner slice table. We
  used to remove init plans and subplans a little bit earlier, before
  set_plan_references(), but there was at least one corner case involving
  inherited tables where you could have a SubPlan referring a subplan
  in the plan tree before set_plan_references(), but set_plan_references()
  transformed it away. You would end up with an unused subplan in that
  case, even though we previously removed any unused subplans. This way
  we don't need to deal with unused slices in the executor.

* Rewrite the logic to account direct-dispatch cost saving in plan cost.
Reviewed-by: NTaylor Vesely <tvesely@pivotal.io>

93abe741

16 12月, 2019 1 次提交

Elide explicit motion when result relation locus is not changed. · a46400ef

由 Zhenghua Lyu 提交于 12月 16, 2019

Delete or Update statement may add motions above result relation
to determine whether the tuple is to delete or update. For example,

-- t1 distributed by (b), t2 distributed by(a)
delete from t1 using t2 where t1.b = t2.a;
The above sql might add motion above t1 (the result relation) when creating the
t1 join t2 plan.

ExecDelete or ExecUpdate use ctid to find the tuple and ctid only make sense
in its original segment. So greenplum will add explicit redistributed motion to
send back each tuple and then do delete or update.

Previously, the condition to add explicit redistributed motion is that no motions in
the subplan of modify table. This can be improved:

if no motion is added above the result relations, then we could elide this motion.
if subpath's locus is equal to result relation's locus and both are hashed locus.

a46400ef

21 11月, 2019 1 次提交

Merge 'one-phase commit' and 'commit not prepared' · 6e0d5998

由 Gang Xiong 提交于 9月 20, 2019

We introduced 'commit not prepared' command for transaction like:
   BEGIN;
   read-only queries;
   END;
And then we introduced 'one-phase commit' for transaction goes to one single
segment like:
   INSERT INTO tbl VALUES(1);
They are actually the same thing, so merge the code together.

6e0d5998

28 5月, 2019 1 次提交

Optimize explicit transactions · b43629be

由 xiong-gang 提交于 5月 28, 2019

Currently, explicit 'BEGIN' creates a full-size writer gang and starts a transaction
on it, the following 'END' will commit the transaction in a two-phase way. It can be
optimized for some cases:
case 1:
BEGIN;
SELECT * FROM pg_class;
END;

case 2:
BEGIN;
SELECT * FROM foo;
SELECT * FROM bar;
END;

case 3:
BEGIN;
INSERT INTO foo VALUES(1);
INSERT INTO bar VALUES(2);
END;

For case 1, it's unnecessary to create a gang and no need to have two-phase commit.
For case 2, it's unnecessary to have two-phase commit as the executors don't write
any XLOG.
For case 3, don't have to create a full-size writer gang and do two-phase commit on
a full-size gang.
Co-authored-by: NJialun Du <jdu@pivotal.io>

b43629be

21 5月, 2019 4 次提交

G
Revert "Fix the compile issue and recommit 0630a9c7" · cc09cb57
由 Gang Xiong 提交于 5月 21, 2019
```
This reverts commit d97a7f6c.
Some behave tests failed.
```
cc09cb57
G

Fix the compile issue and recommit 0630a9c7 · d97a7f6c
由 Gang Xiong 提交于 5月 21, 2019

d97a7f6c
G
Revert "Optimize explicit transactions" · 17c0e455
由 Gang Xiong 提交于 5月 21, 2019
```
This reverts commit 0630a9c7.
```
17c0e455

Optimize explicit transactions · 0630a9c7

由 xiong-gang 提交于 5月 21, 2019

Currently, explicit 'BEGIN' creates a full-size writer gang and starts a transaction
on it, the following 'END' will commit the transaction in a two-phase way. It can be
optimized for some cases:
case 1:
BEGIN;
SELECT * FROM pg_class;
END;

case 2:
BEGIN;
SELECT * FROM foo;
SELECT * FROM bar;
END;

case 3:
BEGIN;
INSERT INTO foo VALUES(1);
INSERT INTO bar VALUES(2);
END;

For case 1, it's unnecessary to create a gang and no need to have two-phase commit.
For case 2, it's unnecessary to have two-phase commit as the executors don't write
any XLOG.
For case 3, don't have to create a full-size writer gang and do two-phase commit on
a full-size gang.
Co-authored-by: NJialun Du <jdu@pivotal.io>

0630a9c7

15 3月, 2019 1 次提交

Retire PlannerConfig::cdbpath_segments · cd4c83a4

由 Ning Yu 提交于 2月 22, 2019

We used to make cost calculation with this property, it is equal to the
segments count of the cluster, however this is wrong when the table is a
partial one (this happens during gpexpand).  We should always get
numsegments from the motion.

The gangsize.sql test is updated as in some of its queries the slices
order is different than before due to change of the costs.

cd4c83a4

11 3月, 2019 1 次提交

Retire the reshuffle method for table data expansion (#7091) · 1c262c6e

由 Ning Yu 提交于 3月 11, 2019

This method was introduced to improve the data redistribution
performance during gpexpand phase2, however per benchmark results the
effect does not reach our expectation. For example when expanding a
table from 7 segments to 8 segments the reshuffle method is only 30%
faster than the traditional CTAS method, when expanding from 4 to 8
segments reshuffle is even 10% slower than CTAS. When there are indexes
on the table the reshuffle performance can be worse, and extra VACUUM is
needed to actually free the disk space. According to our experiments
the bottleneck of reshuffle method is on the tuple deletion operation,
it is much slower than the insertion operation used by CTAS.

The reshuffle method does have some benefits, it requires less extra
disk space, it also requires less network bandwidth (similar to CTAS
method with the new JCH reduce method, but less than CTAS + MOD). And
it can be faster in some cases, however as we can not automatically
determine when it is faster it is not easy to get benefit from it in
practice.

On the other side the reshuffle method is less tested, it is possible to
have bugs in corner cases, so it is not production ready yet.

In such a case we decided to retire it entirely for now, we might add it
back in the future if we can get rid of the slow deletion or find out
reliable ways to automatically choose between reshuffle and ctas
methods.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/8xknWag-SkI/5OsIhZWdDgAJReviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

1c262c6e

27 11月, 2018 1 次提交

Correct numsegments in reshuffle node · c868f3fe

由 Zhenghua Lyu 提交于 11月 27, 2018

Previously the reshuffle node's numsegments is always
set to the cluster size. Now we have flexible gang & dispath
API, we should correct the numsegments field of reshuffle
node to set it as the its lefttree's flow->numsegments.
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>

c868f3fe

22 11月, 2018 1 次提交

Pick a smarter Hashed locus for LEFT and RIGHT JOINs. · 3d6c78c9

由 Heikki Linnakangas 提交于 11月 22, 2018

When determining the locus for a LEFT or RIGHT JOIN, we can use the outer
side's distribution key as is. The EquivalenceClasses from the nullable
side are not of interest above the join, and the outer side's distribution
key can lead to better plans, because it can be made a Hashed locus,
rather than HashedOJ. A Hashed locus can be used for grouping, for
example, unlike a HashedOJ.

This buys back better plans for some INSERT and CTAS queries, that started
to need Redistribute Motions after the previous commit.
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>

3d6c78c9

07 11月, 2018 1 次提交

Adjust GANG size according to numsegments · 6dd2759a

由 ZhangJackey 提交于 11月 07, 2018

Now we have  partial tables and flexible GANG API, so we can allocate
GANG according to numsegments.

With the commit 4eb65a53, GPDB supports table distributed on partial segments,
and with the series of commits (a3ddac06, 576690f2), GPDB supports flexible
gang API. Now it is a good time to combine both the new features. The goal is
that creating gang only on the necessary segments for each slice. This commit
also improves singleQE gang scheduling and does some code clean work. However,
if ORCA is enabled, the behavior is just like before.

The outline of this commit is:

  * Modify the FillSliceGangInfo API so that gang_size is truly flexible.
  * Remove numOutputSegs and outputSegIdx fields in motion node. Add a new
     field isBroadcast to mark if the motion is a broadcast motion.
  * Remove the global variable gp_singleton_segindex and make singleQE
     segment_id randomly(by gp_sess_id).
  * Remove the field numGangMembersToBeActive in Slice because it is now
     exactly slice->gangsize.
  * Modify the message printed if the GUC Test_print_direct_dispatch_info
     is set.
  * Explicitly BEGIN create a full gang now.
  * format and remove destSegIndex
  * The isReshuffle flag in ModifyTable is useless, because it only is used
     when we want to insert tuple to the segment which is out the range of
     the numsegments.

Co-authored-by: Zhenghua Lyu zlv@pivotal.io

6dd2759a