提交 · b4e2e3e2280ca81d1a3a1b20d9659e229238ed18 · Greenplum / Gpdb

14 3月, 2019 1 次提交

Rename legacy planner to Postgres planner · b4e2e3e2

由 Daniel Gustafsson 提交于 3月 14, 2019

As we merge with upstream and by that keep refining the Postgres
planner, legacy planner is no longer a suitable name. This changes
all variations of the spelling (legacy planner, legacy optimizer,
legacy query optimizer etc) to say "Postgres" rather than "legacy".
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
Reviewed-by: NDavid Yozie <dyozie@pivotal.io>
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

b4e2e3e2

27 2月, 2019 1 次提交

refactor NUMSEGMENTS related macro (#7028) · d28b7057

由 Jialun 提交于 2月 27, 2019

- Retire GP_POLICY_ALL_NUMSEGMENTS and GP_POLICY_ENTRY_NUMSEGMENTS,
  unify to getgpsegmentCount
- retire GP_POLICY_MINIMAL_NUMSEGMENTS & GP_POLICY_RANDOM_NUMSEGMENTS
- Change NUMSEGMENTS related macro from variable macro to function
  macro
- Change default return value of getgpsegmentCount to 1, which
  represents a singleton postgresql in utility mode
- change __GP_POLICY_INVALID_NUMSEGMENTS to GP_POLICY_INVALID_NUMSEGMENTS

d28b7057

15 2月, 2019 1 次提交
- P
  Fix subquery_motionHazard_walker() to accomodate more plan node types. (#6803) · eea51362
  由 Paul Guo 提交于 2月 15, 2019
```
Also refactor subquery_motionHazard_walker() to make it simpler.
```
  eea51362
14 2月, 2019 1 次提交

Handle parameterized paths correctly when creating a join path. (#6770) · 5a808652

由 Paul Guo 提交于 2月 14, 2019

After we have parameterized path since pg 9.2 and lateral (since pg9.3 although
we do not support the full functionality), merge join path and hash join path
need to consider that. Besides, for nestloop path, the previous code is wrong.

1. It did not allow motion for paths include index (path_contains_inner_index()).
   That is wrong.  Here are two examples of index paths which allow motion.

->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.17..24735.67 rows=86100 width=0)
    ->  Index Only Scan using t2i on t2  (cost=0.17..21291.67 rows=28700 width=0)

->  Broadcast Motion 1:3  (slice1; segments: 1)  (cost=0.17..6205.12 rows=259 width=8)
    ->  Index Scan using t2i on t2  (cost=0.17..6201.67 rows=29 width=8)
        Index Cond: (4 = a)

2. The inner path and outer path might require upper nodes for parameterized
   paths so current code code
     bms_overlap(inner_req_outer, outer_path->parent->relids)
   is definitely not sufficient, besides, outer_path could have paramterized
   paths also.

For nestloop join, case 1 is covered by the test case added in join_gp.
For case 2, the test case in join.sql (although ignored) in this patch
actually partially tested.

Note the change in this patch is conservative. In theory, we could refer
subplan code to allow broadcast for base rel if needed (for this solution
no motion is needed), but that needs much effort and does not seem
to be deserved given we will probably refactor related code for the
lateral support in the near future.

5a808652

12 2月, 2019 1 次提交

Ensure that Motion nodes in parameterized plans are not rescanned. · 25763c22

由 Heikki Linnakangas 提交于 2月 12, 2019

In plans with a Nested Loop join on the inner side of another Nested Loop
join, the planner could produce a plan where a Motion node was rescanned.
That produced an error at execution time:

ERROR: illegal rescan of motion node: invalid plan (nodeMotion.c:1604) (seg0 slice4 127.0.0.1:40000 pid=27206) (nodeMotion.c:1604)
HINT: Likely caused by bad NL-join, try setting enable_nestloop to off

Make sure we add a Materialize node to shield the from rescanning in such
cases.

While we're at it, add an explicit flag to MaterialPaths and plans, to
indicate that the Material node was added to shield the child node from
rescanning. There was a weaker test in ExecInitMaterial itself, which just
checked if the immediately child was a Motion node, but that feels
sketchy; what if there's a Result node in between, for example? However, I
kept the direct check for a Motion node, too, because I'm not sure if there
are other places where we add Material nodes on top of Motions, aside from
the call in create_nestloop_path() that this fixes. ORCA probably does
that, at least.

Fixes https://github.com/greenplum-db/gpdb/issues/6769Reviewed-by: NPengzhou Tang <ptang@pivotal.io>
Reviewed-by: NPaul Guo <pguo@pivotal.io>

25763c22

01 2月, 2019 1 次提交

Use normal hash operator classes for data distribution. · 242783ae

由 Heikki Linnakangas 提交于 2月 01, 2019

Replace the use of the built-in hashing support for built-in datatypes, in
cdbhash.c, with the normal PostgreSQL hash functions. Now is a good time
to do this, since we've already made the change to use jump consistent
hashing in GPDB 6, so we'll need to deal with the upgrade problems
associated with changing the hash functions, anyway.

It is no longer enough to track which columns/expressions are used to
distribute data. You also need to know the hash function used. For that,
a new field is added to gp_distribution_policy, to record the hash
operator class used for each distribution key column. In the planner,
a new opfamily field is added to DistributionKey, to track that throughout
the planning.

Normally, if you do "CREATE TABLE ... DISTRIBUTED BY (column)", the
default hash operator class for the datatype is used. But this patch
extends the syntax so that you can specify the operator class explicitly,
like "... DISTRIBUTED BY (column opclass)". This is similar to how an
operator class can be specified for each column in CREATE INDEX.

To support upgrade, the old hash functions have been converted to special
(non-default) operator classes, named cdbhash_*_ops. For example, if you
want to use the old hash function for an integer column, you could do
"DISTRIBUTED BY (intcol cdbhash_int4_ops)". The old hard-coded whitelist
of operators that have "compatible" cdbhash functions has been replaced
by putting the compatible hash opclasses in the same operator family. For
example, all legacy integer operator classes, cdbhash_int2_ops,
cdbhash_int4_ops and cdbhash_int8_ops, are all part of the
cdbhash_integer_ops operator family).

This removes the pg_database.hashmethod field. The hash method is now
tracked on a per-table and per-column basis, using the opclasses, so it's
not needed anymore.

To help with upgrade from GPDB 5, this introduces a new GUC called
'gp_use_legacy_hashops'. If it's set, CREATE TABLE uses the legacy hash
opclasses, instead of the default hash opclasses, if the opclass is not
specified explicitly. pg_upgrade will set the new GUC, to force the use of
legacy hashops, when restoring the schema dump. It will also set the GUC
on all upgraded databases, as a per-database option, so any new tables
created after upgrade will also use the legacy opclasses. It seems better
to be consistent after upgrade, so that collocation between old and new
tables work for example. The idea is that some time after the upgrade, the
admin can reorganize all tables to use the default opclasses instead. At
that point, he should also clear the GUC on the converted databases. (Or
rather, the automated tool that hasn't been written yet, should do that.)

ORCA doesn't know about hash operator classes, or the possibility that we
might need to use a different hash function for two columns with the same
datatype. Therefore, it cannot produce correct plans for queries that mix
different distribution hash opclasses for the same datatype, in the same
query. There are checks in the Query->DXL translation, to detect that
case, and fall back to planner. As long as you stick to the default
opclasses in all tables, we let ORCA to create the plan without any regard
to them, and use the default opclasses when translating the DXL plan to a
Plan tree. We also allow the case that all tables in the query use the
"legacy" opclasses, so that ORCA works after pg_upgrade. But a mix of the
two, or using any non-default opclasses, forces ORCA to fall back.

One curiosity with this is the "int2vector" and "aclitem" datatypes. They
have a hash opclass, but no b-tree operators. GPDB 4 used to allow them
as DISTRIBUTED BY columns, but we forbid that in GPDB 5, in commit
56e7c16b. Now they are allowed again, so you can specify an int2vector
or aclitem column in DISTRIBUTED BY, but it's still pretty useless,
because the planner still can't form EquivalenceClasses on it, and will
treat it as "strewn" distribution, and won't co-locate joins.

Abstime, reltime, tinterval datatypes don't have default hash opclasses.
They are being removed completely on PostgreSQL v12, and users shouldn't
be using them in the first place, so instead of adding hash opclasses for
them now, we accept that they can't be used as distribution key columns
anymore. Add a check to pg_upgrade, to refuse upgrade if they are used
as distribution keys in the old cluster. Do the same for 'money' datatype
as well, although that's not being removed in upstream.

The legacy hashing code for anyarray in GPDB 5 was actually broken. It
could produce a different hash value for two arrays that are considered
equal, according to the = operator, if there were differences in e.g.
whether the null bitmap was stored or not. Add a check to pg_upgrade, to
reject the upgrade if array types were used as distribution keys. The
upstream hash opclass for anyarray works, though, so it is OK to use
arrays as distribution keys in new tables. We just don't support binary
upgrading them from GPDB 5. (See github issue
https://github.com/greenplum-db/gpdb/issues/5467). The legacy hashing of
'anyrange' had the same problem, but that was new in GPDB 6, so we don't
need a pg_upgrade check for that.

This also tightens the checks ALTER TABLE ALTER COLUMN and CREATE UNIQUE
INDEX, so that you can no longer create a situation where a non-hashable
column becomes the distribution key. (Fixes github issue
https://github.com/greenplum-db/gpdb/issues/6317)

Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/4fZVeOpXllQCo-authored-by: NMel Kiyama <mkiyama@pivotal.io>
Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io>
Co-authored-by: NPengzhou Tang <ptang@pivotal.io>
Co-authored-by: NChris Hajas <chajas@pivotal.io>
Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
Reviewed-by: NNing Yu <nyu@pivotal.io>
Reviewed-by: NSimon Gao <sgao@pivotal.io>
Reviewed-by: NJesse Zhang <jzhang@pivotal.io>
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
Reviewed-by: NYandong Yao <yyao@pivotal.io>

242783ae

15 12月, 2018 1 次提交

Refactor executor code for TableScan, DynamicTableScan, BitmapHeapScan. · db516347

由 Heikki Linnakangas 提交于 12月 15, 2018

This removes a lot of GPDB-specific code that was used to deal with
dynamic scans, and code duplication between nodes dealing with Heap, AO
and AOCS tables.

* Resurrect SeqScan node. We had replaced it with TableScan in GPDB.
  Teach SeqScan to also work on append-only and AOCS tables, and remove
  TableScan and all the code changes that were made in GPDB earlier to
  deal with all three table types.

* Merge BitmapHeapScan, BitmapAppendOnlyScan, and BitmapTableScan node
  types. They're all BitmapHeapScans now. We used to use BitmapTableScans
  in ORCA-generated plans, and BitmapHeapScan/BitmapAppendOnlyScan in
  planner-generated plans, and there was no good reason for the
  difference. The "heap" part in the name is a bit misleading, but I
  prefer to keep the upstream name, even though it now handles AO tables
  as well. It's more like the old BitmapTableScan now, which also handled
  all three table types, but the code is refactored to stay as close to
  upstream as possible.

* Introduce DynamicBitmapHeapScan. BitmapTableScan used to perform Dynamic
  scans too, now it's the responsibility of the new DynamicBitmapHeapScan
  plan node, just like we have DynamicTableScan and DynamicIndexScan as
  wrappers around SeqScand and IndexScans.

* Get rid of BitmapAppendOnlyPath in the planner, too. Use BitmapHeapPath
  also for AO tables.

* Refactor the way Dynamic Table Scan works. A Dynamic Table Scan node
  is now just a thin wrapper around SeqScan. It initializes a new
  SeqScan executor node for every partition, and lets it do the actual
  scanning. It now works the same way that I refactored Dynamic Index
  Scans to work in commit 198f701e. This allowed removing a lot of code
  that we used to use for both Dynamic Index Scans and Dynamic Table
  Scans, but is no longer used.

There's now some duplication in the Dynamic* nodes, to walk through the
partitions. They all have a function called setPidIndex(), for example,
which does the same thing. But I think it's much more clear this way,
than the previous DynamicController stuff. We could perhaps extract some
of the code to common helper functions, but I think this is OK for now.

This also fixes issue #6274. I'm not sure what exactly the bug was, but it
was clearly in the Bitmap Table Scan code that is used with ORCA-generated
plans. Now that we use the same code for plans generated with the Postgres
planner and ORCA, it's not surprising that the bug is gone.
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>

db516347

07 12月, 2018 1 次提交

Create partition table with same numsegments for parent and children · 8f898338

由 Ning Yu 提交于 12月 07, 2018

When creating a partition table we want children have the same
numsegments with parent.  As they all set their numsegments to DEFAULT,
does this meet our expectation?  No, because DEFAULT does not always
equal to DEFAULT itself.  When DEFAULT is set to RANDOM a different
value is returned each time.

So we have to align numsegments explicitly.

Also removed an incorrect assert and comment.

8f898338

03 12月, 2018 2 次提交
- H
  
  Misc whitespace cleanup to reduce diff vs. upstream. · 167de83e
  由 Heikki Linnakangas 提交于 12月 02, 2018
  
  167de83e
- H
  
  Remove misc unnecessary #includes. · 1e395a0b
  由 Heikki Linnakangas 提交于 12月 02, 2018
  
  1e395a0b
27 11月, 2018 1 次提交

Replace PathKey with new DistributionKey struct, in CdbPathLocus. · 882958da

由 Heikki Linnakangas 提交于 11月 27, 2018

In PostgreSQL, a PathKey represents sort ordering, but we have been using
it in GPDB to also represent the distribution keys of hash-distributed
data in the planner. (i.e. the keys in DISTRIBUTED BY of a table, but also
when data is redistributed by some other key on the fly). That's been
convenient, and there's some precedent for that, since PostgreSQL also
uses PathKey to represent GROUP BY columns, which is quite similar to
DISTRIBUTED BY.

However, there are some differences. The opfamily, strategy and nulls_first
fields in PathKey are not applicable to distribution keys. Using the same
struct to represent ordering and hash distribution is sometimes convenient,
for example when we need to test whether the sort order or grouping is
"compatible" with the distribution. But at other times, it's confusing.

To clarify that, introduce a new DistributionKey struct, to represent
a hashed distribution. While we're at it, simplify the representation of
HashedOJ locus types, by including a List of EquivalenceClasses in
DistributionKey, rather than just one EC like a PathKey has. CdbPathLocus
now has only one 'distkey' list that is used for both Hashed and HashedOJ
locus, and it's a list of DistributionKeys. Each DistributionKey in turn
can contain multiple EquivalenceClasses.

Looking ahead, I'm working on a patch to generalize the "cdbhash"
mechanism, so that we'd use the normal Postgres hash opclasses for
distribution keys, instead of hard-coding support for specific datatypes.
With that, the hash operator class or family will be an important part of
the distribution key, in addition to the datatype. The plan is to store
that also in DistributionKey.
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>

882958da

23 11月, 2018 1 次提交

Fix numsegments when appending multiple SingleQEs · fa86f160

由 Ning Yu 提交于 11月 23, 2018

When Append node contains SingleQE subpath we used to put Append on ALL
the segments, however if the SingleQE is partially distributed then
apparently we could not put the SingleQE on ALL the segments, this
conflict could results in runtime or incorrect results.

To fix this we should put Append on SingleQE's segments.

In the other hand when there are multiple SingleQE subpaths we should
put Append on the common segments of SingleQEs.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

fa86f160

19 11月, 2018 1 次提交

Add support for executing foreign tables on master, any or all segments · 3c6c6ab2

由 Adam Lee 提交于 7月 12, 2018

This commit adds the support and option of `mpp_execute 'MASTER | ANY |
ALL SEGMENTS'` for foreign tables.

MASTER is the default, FDW requests for data from master.

ANY, FDW requests for data from master or one any segment, depends on
which path costs less.

ALL SEGMENTS, FDW requests for data from all segments, wrappers need to
have a policy matching the segments to data.

For instance, file_fdw probes the mpp_execute vaule, then load different
files based on the segment number. But something like gpfdist on the
foreign side doesn't need this, which hands out a different slice of the
data to each request, all segments could request the same location.

3c6c6ab2

12 11月, 2018 1 次提交

Fix another issue on of inheritance table · 39856768

由 Pengzhou Tang 提交于 11月 07, 2018

Previously, when creating a APPEND node for inheritance table, if
subpaths has different number segments in gp_distribution_policy,
the whole APPEND node might be assigned with a wrong numsegments,
so some segments can not get plans and lost data in the results.

39856768

05 11月, 2018 1 次提交
- H
  Remove unused code to check whether locus is a superset of another locus. · 88dca9f4
  由 Heikki Linnakangas 提交于 11月 05, 2018
```
All callers of cdbpathlocus_compare were asking for strict equality
check.
```
  88dca9f4
12 10月, 2018 1 次提交

Function Scans can be rescanned, no need for a Material node. · ead090e9

由 Heikki Linnakangas 提交于 10月 11, 2018

Function Scan materializes the result of each in a TupleStore, and can
be rescanned. Mark it as rescannable in the planner, so that we avoid
putting a pointless Materialize node on top of it.
Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>

ead090e9

28 9月, 2018 1 次提交

Allow tables to be distributed on a subset of segments · 4eb65a53

由 ZhangJackey 提交于 9月 28, 2018

There was an assumption in gpdb that a table's data is always
distributed on all segments, however this is not always true for example
when a cluster is expanded from M segments to N (N > M) all the tables
are still on M segments, to workaround the problem we used to have to
alter all the hash distributed tables to randomly distributed to get
correct query results, at the cost of bad performance.

Now we support table data to be distributed on a subset of segments.

A new columne `numsegments` is added to catalog table
`gp_distribution_policy` to record how many segments a table's data is
distributed on.  By doing so we could allow DMLs on M tables, joins
between M and N tables are also supported.

```sql
-- t1 and t2 are both distributed on (c1, c2),
-- one on 1 segments, the other on 2 segments
select localoid::regclass, attrnums, policytype, numsegments
    from gp_distribution_policy;
 localoid | attrnums | policytype | numsegments
----------+----------+------------+-------------
 t1       | {1,2}    | p          |           1
 t2       | {1,2}    | p          |           2
(2 rows)

-- t1 and t1 have exactly the same distribution policy,
-- join locally
explain select * from t1 a join t1 b using (c1, c2);
                   QUERY PLAN
------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)
   ->  Hash Join
         Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
         ->  Seq Scan on t1 a
         ->  Hash
               ->  Seq Scan on t1 b
 Optimizer: legacy query optimizer

-- t1 and t2 are both distributed on (c1, c2),
-- but as they have different numsegments,
-- one has to be redistributed
explain select * from t1 a join t2 b using (c1, c2);
                          QUERY PLAN
------------------------------------------------------------------
 Gather Motion 1:1  (slice2; segments: 1)
   ->  Hash Join
         Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
         ->  Seq Scan on t1 a
         ->  Hash
               ->  Redistribute Motion 2:1  (slice1; segments: 2)
                     Hash Key: b.c1, b.c2
                     ->  Seq Scan on t2 b
 Optimizer: legacy query optimizer
```

4eb65a53

25 9月, 2018 1 次提交

Allow to add motion to unique-ify the path in create_unique_path(). (#5589) · e9fe4224

由 Paul Guo 提交于 9月 25, 2018

create_unique_path() could be used to convert semi join to inner join.
Previously, during the Semi-join refactor in commit d4ce0921, creating unique
path was disabled for the case where duplicats might be on different QEs.

In this patch we enable adding motion to unique_ify the path, only if unique
mothod is not UNIQUE_PATH_NOOP. We don't create unique path for that case
because if later on during plan creation, it is possible to create a motion
above this unique path whose subpath is a motion. In that case, the unique path
node will be ignored and we will get a motion plan node above a motion plan
node and that is bad. We could further improve that, but not in this patch.
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NPaul Guo <paulguo@gmail.com>

e9fe4224

21 9月, 2018 1 次提交

Remove duplicated code to handle SeqScan, AppendOnlyScan and AOCSScan. · ff8161a2

由 Heikki Linnakangas 提交于 9月 21, 2018

They were all treated the same, with the SeqScan code being duplicated
for AppendOnlyScans and AOCSScans. That is a merge hazard: if some code
is changed for SeqScans, we would have to remember to manually update
the other copies. Small differences in the code had already crept up,
although given that everything worked, I guess it had no effect. Or
only had a small effect on the computed costs.

To avoid the duplication, use SeqScan for all of them. Also get rid of
TableScan as a separate node type, and have ORCA translator also create
SeqScans.

The executor for SeqScan node can handle heap, AO and AOCS tables, because
we're not actually using the upstream SeqScan code for it. We're using the
GPDB code in nodeTableScan.c, and a TableScanState, rather than
SeqScanState, as the executor node. That's how it worked before this patch
already, what this patch changes is that we now use SeqScan *before* the
executor phase, instead of SeqScan/AppendOnlyScan/AOCSScan/TableScan.

To avoid having to change all the expected outputs for tests that use
EXPLAIN, add code to still print the SeqScan as "Seq Scan", "Table Scan",
"Append-only Scan" or "Append-only Columnar Scan", depending on whether
the plan was generated by ORCA, and what kind of a table it is.

ff8161a2

19 9月, 2018 1 次提交

Fix "could not find pathkey item to sort" error with MergeAppend plans. · 1722adb8

由 Heikki Linnakangas 提交于 9月 18, 2018

When building a Sort node to represent the ordering that is preserved
by a Motion node, in make_motion(), the call to make_sort_from_pathkeys()
would sometimes fail with "could not find pathkey item to sort". This
happened when the ordering was over a UNION ALL operation. When building
Motion nodes for MergeAppend subpaths, the path keys that represented the
ordering referred to the items in the append rel's target list, not the
subpaths. In create_merge_append_plan(), where we do a similar thing for
each subpath, we correctly passed the 'relids' argument to
prepare_sort_from_pathkeys(), so that prepare_sort_from_pathkeys() can
match the target list entries of the append relation with the entries of
the subpaths. But when creating the Motion nodes for each subpath, we
were passing NULL as 'relids' (via make_sort_from_pathkeys()).

At a high level, the fix is straightforward: we need to pass the correct
'relids' argument to prepare_sort_from_pathkeys(), in
cdbpathtoplan_create_motion_plan(). However, the current code structure
makes that not so straightforward, so this required some refactoring of
the make_motion() and related functions:

Previously, make_motion() and make_sorted_union_motion() would take a path
key list as argument, to represent the ordering, and it called
make_sort_from_pathkeys() to extract the sort columns, operators etc.
After this patch, those functions take arrays of sort columns, operators,
etc. directly as arguments, and the caller is expected to do the call to
make_sort_from_pathkeys() to get them, or build them through some other
means. In cdbpathtoplan_create_motion_plan(), call
prepare_sort_from_pathkeys() directly, rather than the
make_sort_from_pathkeys() wrapper, so that we can pass the 'relids'
argument. Because prepare_sort_from_pathkeys() is marked as 'static', move
cdbpathtoplan_create_motion_plan() from cdbpathtoplan.c to createplan.c,
so that it can call it.

Add test case. It's a slightly reduced version of a query that we already
had in 'olap_group' test, but seems better to be explicit. Revert the
change in expected output of 'olap_group', made in commit 28087f4e,
which memorized the error in the expected output.

Fixes https://github.com/greenplum-db/gpdb/issues/5695.
Reviewed-by: NPengzhou Tang <ptang@pivotal.io>
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>

1722adb8

29 8月, 2018 1 次提交

Remove unused argument 'root' from set_cheapest and add_path. · 9ea947d2

由 Richard Guo 提交于 8月 29, 2018

This keeps the same with PostgreSQL.
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NRichard Guo <guofenglinux@gmail.com>

9ea947d2

24 8月, 2018 1 次提交

Fix redistribute bug on some types which need to convert (#5568) · b0fbb5c7

由 Jinbao Chen 提交于 8月 24, 2018

After 8.4 merge, we have two restrictlist 'mergeclause_list'
and 'hashclause_list' in function 'add_paths_to_joinrel'. We
use mergeclause_list in cdb motion in hashjoin. But some of
keys should not been used as distribution keys.

Add a whitelist that which operator is  distribution-compatible.

b0fbb5c7

15 8月, 2018 1 次提交

Erase FIXMEs in errcodes.txt · 96968046

由 xiong-gang 提交于 8月 15, 2018

* Remove ERRCODE_GP_FEATURE_NOT_SUPPORTED and use ERRCODE_FEATURE_NOT_SUPPORTED instead
* Remove ERROR_INVALID_WINDOW_FRAME_PARAMETER and use ERRCODE_WINDOWING_ERROR instead
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NGang Xiong <gxiong@pivotal.io>

96968046

13 8月, 2018 1 次提交

Remove cdbpath_rows function · b2411b59

由 xiong-gang 提交于 8月 13, 2018

Replace function `cdbpath_rows(root, path)` with path->rows, this is more in
line with upstream 9.2, thus removes a GPDB_92_MERGE_FIXME

Co-authored-by: Alexandra Wang<lewang@pivotal.io>
Co-authored-by: Gang Xiong<gxiong@pivotal.io>

b2411b59

03 8月, 2018 1 次提交
- K
  Revert "Merge with PostgreSQL 9.2beta2." · e0aa3ef2
  由 Karen Huddleston 提交于 8月 02, 2018
```
This reverts commit 4750e1b6.
```
  e0aa3ef2
02 8月, 2018 1 次提交

Merge with PostgreSQL 9.2beta2. · 4750e1b6

由 Richard Guo 提交于 8月 02, 2018

This is the final batch of commits from PostgreSQL 9.2 development,
up to the point where the REL9_2_STABLE branch was created, and 9.3
development started on the PostgreSQL master branch.

Notable upstream changes:

* Index-only scan was included in the batch of upstream commits. It
  allows queries to retrieve data only from indexes, avoiding heap access.

* Group commit was added to work effectively under heavy load. Previously,
  batching of commits became ineffective as the write workload increased,
  because of internal lock contention.

* A new fast-path lock mechanism was added to reduce the overhead of
  taking and releasing certain types of locks which are taken and released
  very frequently but rarely conflict.

* The new "parameterized path" mechanism was added. It allows inner index
  scans to use values from relations that are more than one join level up
  from the scan. This can greatly improve performance in situations where
  semantic restrictions (such as outer joins) limit the allowed join orderings.

* SP-GiST (Space-Partitioned GiST) index access method was added to support
  unbalanced partitioned search structures. For suitable problems, SP-GiST can
  be faster than GiST in both index build time and search time.

* Checkpoints now are performed by a dedicated background process. Formerly
  the background writer did both dirty-page writing and checkpointing. Separating
  this into two processes allows each goal to be accomplished more predictably.

* Custom plan was supported for specific parameter values even when using
  prepared statements.

* API for FDW was improved to provide multiple access "paths" for their tables,
  allowing more flexibility in join planning.

* Security_barrier option was added for views to prevents optimizations that
  might allow view-protected data to be exposed to users.

* Range data type was added to store a lower and upper bound belonging to its
  base data type.

* CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
  SELECT query is planned during the execution of the utility. To conform to
  this change, GPDB executes the utility statement only on QD and dispatches
  the plan of the SELECT query to QEs.
Co-authored-by: NAdam Lee <ali@pivotal.io>
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NAsim R P <apraveen@pivotal.io>
Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
Co-authored-by: NGang Xiong <gxiong@pivotal.io>
Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
Co-authored-by: NPaul Guo <paulguo@gmail.com>
Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>

4750e1b6

09 7月, 2018 1 次提交

Use a penalty cost to implement enable_* planner GUCs, like in upstream. · 8fcd3fdd

由 Heikki Linnakangas 提交于 2月 05, 2018

Instead of completely disabling the generation of Paths with disabled
plan types, add a high penalty to their cost estimates, like in the
upstream. This reduces our diff vs. upstream, making future merges more
straightforward.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/Az2cDcqf73g/_tY6Yv1kBgAJCo-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
Reviewed-by: NRichard Guo <riguo@pivotal.io>

8fcd3fdd

29 3月, 2018 1 次提交

Support replicated table in GPDB · 7efe3204

由 Pengzhou Tang 提交于 1月 29, 2018

* Support replicated table in GPDB

Currently, tables are distributed across all segments by hash or random in GPDB. There
are requirements to introduce a new table type that all segments have the duplicate
and full table data called replicated table.

To implement it, we added a new distribution policy named POLICYTYPE_REPLICATED to mark
a replicated table and added a new locus type named CdbLocusType_SegmentGeneral to specify
the distribution of tuples of a replicated table.  CdbLocusType_SegmentGeneral implies
data is generally available on all segments but not available on qDisp, so plan node with
this locus type can be flexibly planned to execute on either single QE or all QEs. it is
similar with CdbLocusType_General, the only difference is that CdbLocusType_SegmentGeneral
node can't be executed on qDisp. To guarantee this, we try our best to add a gather motion
on the top of a CdbLocusType_SegmentGeneral node when planing motion for join even other
rel has bottleneck locus type, a problem is such motion may be redundant if the single QE
is not promoted to executed on qDisp finally, so we need to detect such case and omit the
redundant motion at the end of apply_motion(). We don't reuse CdbLocusType_Replicated since
it's always implies a broadcast motion bellow, it's not easy to plan such node as direct
dispatch to avoid getting duplicate data.

We don't support replicated table with inherit/partition by clause now, the main problem is
that update/delete on multiple result relations can't work correctly now, we can fix this
later.

* Allow spi_* to access replicated table on QE

Previously, GPDB didn't allow QE to access non-catalog table because the
data is incomplete,
we can remove this limitation now if it only accesses replicated table.

One problem is QE need to know if a table is replicated table,
previously, QE didn't maintain
the gp_distribution_policy catalog, so we need to pass policy info to QE
for replicated table.

* Change schema of gp_distribution_policy to identify replicated table

Previously, we used a magic number -128 in gp_distribution_policy table
to identify replicated table which is quite a hack, so we add a new column
in gp_distribution_policy to identify replicated table and partitioned
table.

This commit also abandon the old way that used 1-length-NULL list and
2-length-NULL list to identify DISTRIBUTED RANDOMLY and DISTRIBUTED
FULLY clause.

Beside, this commit refactor the code to make the decision-making of
distribution policy more clear.

* support COPY for replicated table

* Disable row ctid unique path for replicated table.
  Previously, GPDB use a special Unique path on rowid to address queries
  like "x IN (subquery)", For example:
  select * from t1 where t1.c2 in (select c2 from t3), the plan looks
  like:
   ->  HashAggregate
         Group By: t1.ctid, t1.gp_segment_id
            ->  Hash Join
                  Hash Cond: t2.c2 = t1.c2
                ->  Seq Scan on t2
                ->  Hash
                    ->  Seq Scan on t1

  Obviously, the plan is wrong if t1 is a replicated table because ctid
  + gp_segment_id can't identify a tuple, in replicated table, a logical
  row may have different ctid and gp_segment_id. So we disable such plan
  for replicated table temporarily, it's not the best way because rowid
  unique way maybe the cheapest plan than normal hash semi join, so
  we left a FIXME for later optimization.

* ORCA related fix
  Reported and added by Bhuvnesh Chaudhary <bchaudhary@pivotal.io>
  Fallback to legacy query optimizer for queries over replicated table

* Adapt pg_dump/gpcheckcat to replicated table
  gp_distribution_policy is no longer a master-only catalog, do
  same check as other catalogs.

* Support gpexpand on replicated table && alter the dist policy of replicated table

7efe3204

09 3月, 2018 1 次提交

Whitespace and formatting fixes. · ff940ddc

由 Heikki Linnakangas 提交于 3月 08, 2018

The immediate reason to do this was the "this ‘else’ clause does not guard"
gcc warning from create_mergejoin_path(). But while we're at it, might as
well clean up the whole file.

I spotted one piece of code that looks broken, marked that with a FIXME to
make sure we revisit that.

ff940ddc

08 3月, 2018 1 次提交

Allow using a merge join for a dummy FULL JOIN ON TRUE. · a61bf5e0

由 Heikki Linnakangas 提交于 3月 08, 2018

Like in the 'join' regression test:

postgres=# select * from int4_tbl a full join int4_tbl b on true;
ERROR:  Query requires a feature that has been disabled by a configuration setting.
DETAIL:  Could not devise a query plan for the given query.
HINT:  Current settings:  optimizer=off

a61bf5e0

06 3月, 2018 1 次提交

Fix missing handling of WorkTableScan. · c2fb605d

由 Heikki Linnakangas 提交于 3月 06, 2018

This fixes "unrecognized path type" error with the queries that are added
to the test suite. The tests were originally written by Kavinder back in
October, but were not added to the test suite then, because they were
failing.
Co-authored-by: NKavinder Dhaliwal <kavinderd@gmail.com>

c2fb605d

09 2月, 2018 1 次提交

Refactor the way Semi-Joins plans are constructed. · d4ce0921

由 Heikki Linnakangas 提交于 2月 09, 2018

This removes much of the GPDB machinery to handle "deduplication paths"
within the planner. We will now use the upstream code to build JOIN_SEMI
paths, as well as paths where the outer side of the join is first
deduplicated (JOIN_UNIQUE_OUTER/INNER).

The old style "join first and deduplicate later" plans can be better in
some cases, however. To still be able to generate such plan, add new
JOIN_DEDUP_SEMI join type, which is transformed into JOIN_INNER followed
by the deduplication step after the join, during planning.

This new way of constructing these plans is simpler, and allows removing
a bunch of code, and reverting some more code to the way it is in the
upstream.

I'm not sure if this can generate the same plans that the old code could,
in all cases. In particular, I think the old "late deduplication"
mechanism could delay the deduplication further, all the way to the top of
the join tree. I'm not sure when that woud be useful, though, and the
regression suite doesn't seem to contain any such cases (with EXPLAIN). Or
maybe I misunderstood the old code. In any case, I think this is good
enough.

d4ce0921

24 1月, 2018 1 次提交

Teach reparameterize_path() to handle AppendPaths. · 54e1599c

由 Tom Lane 提交于 1月 23, 2018

If we're inside a lateral subquery, there may be no unparameterized paths
for a particular child relation of an appendrel, in which case we *must*
be able to create similarly-parameterized paths for each other child
relation, else the planner will fail with "could not devise a query plan
for the given query". This means that there are situations where we'd
better be able to reparameterize at least one path for each child.

This calls into question the assumption in reparameterize_path() that
it can just punt if it feels like it. However, the only case that is
known broken right now is where the child is itself an appendrel so that
all its paths are AppendPaths. (I think possibly I disregarded that in
the original coding on the theory that nested appendrels would get folded
together --- but that only happens *after* reparameterize_path(), so it's
not excused from handling a child AppendPath.) Given that this code's been
like this since 9.3 when LATERAL was introduced, it seems likely we'd have
heard of other cases by now if there were a larger problem.

Per report from Elvis Pranskevichus. Back-patch to 9.3.

Discussion: https://postgr.es/m/5981018.zdth1YWmNy@hammer.magicstack.net

54e1599c

27 9月, 2017 7 次提交

Don't assume a subquery's output is unique if there's a SRF in its tlist · e7ff3ef1

由 Ekta Khanna and Jemish Patel 提交于 9月 18, 2017

Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Jul 8 14:03:32 2014 -0400

    While the x output of "select x from t group by x" can be presumed unique,
    this does not hold for "select x, generate_series(1,10) from t group by x",
    because we may expand the set-returning function after the grouping step.
    (Perhaps that should be re-thought; but considering all the other oddities
    involved with SRFs in targetlists, it seems unlikely we'll change it.)
    Put a check in query_is_distinct_for() so it's not fooled by such cases.

    Back-patch to all supported branches.

    David Rowley

(cherry picked from commit 2e7469dc8b3bac4fe0f9bd042aaf802132efde85)

e7ff3ef1

Rename all 8.4-9.0 merge FIXMEs as `GPDB_84_MERGE_FIXME` · 2228c939

由 Dhanashree Kashid, Ekta Khanna and Omer Arap 提交于 8月 17, 2017

We had a bunch of fixmes that we added as part of the subselect merge;
All of the fixmes are now marked as `GPDB_84_MERGE_FIXME` so that they can
be grepped easily.

2228c939

Implement CDB like pre-join deduplication · efb2777a

由 Dhanashree Kashid, Ekta Khanna and Omer Arap 提交于 6月 20, 2017

For flattened IN or EXISTS sublinks, if we chose INNER JOIN path instead
of SEMI JOIN then we need to apply duplicate suppression.

The deduplication can be done in two ways:
1. post-join dedup
unique-ify the inner join results. try_postjoin_dedup in CdbRelDedupInfo denotes
if we need to got for post-join dedup

2. pre-join dedup
unique-ify the rows coming from the rel containing the subquery result,
before that is joined with any other rels. join_unique_ininfo in
CdbRelDedupInfo denotes if we need to go for pre-join dedup.
semi_operators and semi_rhs_exprs are used for this. We ported a
function from 9.5 to compute these in make_outerjoininfo().

Upstream has completely different implementation of this. Upstream explores JOIN_UNIQUE_INNER
and JOIN_UNIQUE_OUTER paths for this and deduplication is done create_unique_path().
GPDB does this differently since JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER are obsolete
for us. Hence we have kept the GPDB style deduplication mechanism as it in this merge.

Post-join has been implemented in previous merge commits.

Ref [#146890743]

efb2777a

CDB Specific changes, other fix-ups after merging · e5f6e826

由 Shreedhar Hardikar 提交于 6月 08, 2017

0. Fix up post join dedup logic after cherry-pick
0. Fix pull_up_sublinks_jointree_recurse returning garbage relids
0. Update gporca, rangefuncs, eagerfree answer fileis
	1. gporca
	Previously we were generating a Hash Inner Join with an
	HashAggregate for deduplication. Now we generate a Hash
	Semi Join in which case we do not need to deduplicate the
	inner.

	2. rangefuncs
	We updated this answer file during the cherry-pick of
	e006a24a since there was a change in plan.
	After these cherry-picks, we are back to the original
	plan as master. Hence we see the original error.

	3. eagerfree
	We are generating a not-very-useful subquery scan node
	with this change. This is not producing wrong results.
	But this subqeury scan needs to be removed.
	We will file a follow-up chore to investigate and fix this.

0. We no longer need helper function `hasSemiJoin()` to check whether
this specialInfo list has any specialJoinInfos constructed for Semi Join
(IN/EXISTS sublink). We have moved that check inside
`cdb_set_cheapest_dedup()`

0. We are not exercising the pre-join-deduplication code path after
this cherry-pick. Before this merge, we had three CDB specific
nodes in `InClauseInfo` in which we recorded information for
pre-join-dedup in case of simple uncorrelated IN sublinks.
`try_join_unique`, `sub_targetlist` and `InOperators`
Since we now have `SpecialJoinInfo` instead of `InClauseInfo`, we need
to devise a way to record this information in `SpecialJoinInfo`.
We have filed a follow-up story for this.

Ref [#142356521]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

e5f6e826

Remove InClauseInfo and OuterJoinInfo · 8b63aafb

由 Ekta Khanna 提交于 5月 09, 2017

Since `InClauseInfo` and `OuterJoinInfo` are now combined into
`SpecialJoinInfo` after merging with e006a24a; this commit remove them
from the relevant places.

Access `join_info_list` instead of `in_info_list` and `oj_info_list`

Previously, `CdbRelDedupInfo` contained list of `InClauseInfo` s. While
making join decisions and overall join processing, we traversed this list
and invoked cdb specific functions: `cdb_make_rel_dedup_info()`, `cdbpath_dedup_fixup()`

Since `InClauseInfo` is no longer available,  `CdbRelDedupInfo` will contain list of
`SpecialJoinInfo` s. All the cdb specific routines which were previously called for
`InClauseInfo` list will now be called if `CdbRelDedupInfo` has valid `SpecialJoinInfo`
list and if join type in `SpecialJoinInfo` is `JOIN_SEMI`. A new helper routine `hasSemiJoin()`
has been added which traverses `SpecialJoinInfo` list to check if it contains `JOIN_SEMI`.

Ref [#142355175]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

8b63aafb

CDBlize the cherry-pick · 0feb1bd9

由 Ekta Khanna 提交于 5月 09, 2017

Original Flow:
cdb_flatten_sublinks
	+--> pull_up_IN_clauses
		+--> convert_sublink_to_join

New Flow:
cdb_flatten_sublinks
	+--> pull_up_sublinks

This commit contains relevant changes for the above flow.

Previously, `try_join_unique` was part of `InClauseInfo`. It was getting
set in `convert_IN_to_join()` and used in `cdb_make_rel_dedup_info()`.
Now, since `InClauseInfo` is not present and we construct
`FlattenedSublink` instead in `convert_ANY_sublink_to_join()`. And later
in the flow, we construct `SpecialJoinInfo` from `FlattenedSublink` in
`deconstruct_sublink_quals_to_rel()`. Hence, adding `try_join_unique` as
part of both `FlattenedSublink` and `SpecialJoinInfo`.

Ref [#142355175]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

0feb1bd9

Implement SEMI and ANTI joins in the planner and executor. · fe2eb2c9

由 Ekta Khanna 提交于 5月 09, 2017

commit e006a24a
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu Aug 14 18:48:00 2008 +0000

Implement SEMI and ANTI joins in the planner and executor. (Semijoins replace
the old JOIN_IN code, but antijoins are new functionality.) Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins. Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation. That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch. So for the moment the output size estimates for semi and
especially anti joins are quite bogus.

Ref [#142355175]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

fe2eb2c9