提交 · 043365ccba649ea19e0c7c22272943b19eee68e7 · Greenplum / Gpdb

01 10月, 2019 3 次提交

Bump ORCA version to 3.73.0 (#8739) · 043365cc

由 Chris Hajas 提交于 9月 30, 2019

Corresponding ORCA commits:
* Refactor: Simplify property derivation in CExpression
* Support on-demand property derivation in CDrvdPropRelational
* Rename DrvdPropArray to DrvdProp and DrvdProp2dArray to DrvdPropArray
* Bump ORCA version to 3.73.0
Authored-by: NChris Hajas <chajas@pivotal.io>

043365cc

docs - Add note for gpfdist/gpload wrong line number in error messages (#8715) · 10820fbe

由 Mel Kiyama 提交于 9月 30, 2019

* docs - Add note for gpfdist/gpload wrong line number in error messages

This will be backporrted to 6X_STABLE, 5X_STABLE, and 4.3.x

* docs - fixed typo

10820fbe

M

docs - add string_agg function as a migration issue (#8722) · b2dc27ce
由 Mel Kiyama 提交于 9月 30, 2019

b2dc27ce

30 9月, 2019 1 次提交

Reset GPDB_EXTRA_COL() markings between DATA lines. · eb1b2852

由 Heikki Linnakangas 提交于 9月 30, 2019

GPDB_EXTRA_COL() is only supposed to affect the next DATA line. But we
failed reset it between DATA lines, so the setting stayed in effect for
all the subsequent lines, too.

This is relatively harmless, because it's mostly only used for the
prodataaccess column, which is ignored by the system anyway. The only
other place where it was used was to set proexeclocation for
pg_event_trigger_dropped_objects(), which also happened to do little damage
to the subsequent lines because all the subsequent lines include the
GPDB-specific columns, which overrode the bogus GPDB_EXTRA_COL() setting.

This was broken in 6X_STABLE, but it's too late to change the catalogs
there. Bump catversion.
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
Reviewed-by: NAdam Lee <ali@pivotal.io>

eb1b2852

27 9月, 2019 1 次提交
- H
  
  catch ImportError for gpversion (#8709) · f249fd76
  由 Huiliang.liu 提交于 9月 27, 2019
  
  f249fd76
26 9月, 2019 5 次提交

Fix GRANT/REVOKE ALL statement PANIC when the schema contains partitioned relations · 7ba2af39

由 Georgios Kokolatos 提交于 9月 26, 2019

The cause of the PANIC was an incorrectly populated list containing the
namespace information for the affected the relation. A GrantStmt contains the
necessary objects in a list named objects. This gets initially populated during
parsing (via the privilege_target rule) and processed during parse analysis based
on the target type and object type to RangeVar nodes, FuncWithArgs nodes or
plain names.

In Greenplum, the catalog information about the partition hierarchies is not
propagated to all segments. This information needs to be processed in the
dispatcher and to be added backed in the parsed statement for the segments to
consume.

In this commit, the partition hierarchy information is expanded only for the
target and object type required. The parsed statement gets updated
uncoditionally of partitioned references before dispatching for required types.

The privileges tests have been updated to get check for privileges in the
segments also.

Problem identified and initial patch by Fenggang <ginobiliwang@gmail.com>,
reviewed and refactored by me.

7ba2af39

D

Docs: fix condition in pivotal .ditaval · 97b55f41
由 David Yozie 提交于 9月 25, 2019

97b55f41
D

Docs - splitting .ditaval files to account for oss/pivotal-specific builds of the install guide · be70d7cf
由 David Yozie 提交于 9月 25, 2019

be70d7cf

docs - move install guide to gpdb repo (#8666) · 26d2db42

由 Mel Kiyama 提交于 9月 25, 2019

* docs - move install guide to gpdb repo

--move Install Guide source files back to gpdb repo.
--update config.yml and gpdb-landing-subnav.erb files for OSS doc builds.
--removed refs directory - unused utility reference pages.
--Also added more info to creating a gpadmin user.

These files have conditionalized text (pivotal and oss-only).

./supported-platforms.xml
./install_gpdb.xml
./apx_mgmt_utils.xml
./install_guide.ditamap
./preinstall_concepts.xml
./migrate.xml
./install_modules.xml
./prep_os.xml
./upgrading.xml

* docs - updated supported platforms with PXF information.

* docs - install guide review comment update

-- renamed one file from supported-platforms.xml to platform-requirements.xml

* docs - reworded requirement/warning based on review comments.

26d2db42

Fix crash in COPY FROM for non-distributed/non-replicated table · 6793882b

由 Ashwin Agrawal 提交于 9月 25, 2019

Current code for COPY FROM picks mode as COPY_DISPATCH for
non-distributed/non-replicated table as well. This causes crash. It
should be using COPY_DIRECT, which is normal/direct mode to be used
for such tables.

The crash was exposed by following SQL commands:

CREATE TABLE public.heap01 (a int, b int) distributed by (a);
INSERT INTO public.heap01 VALUES (generate_series(0,99), generate_series(0,98));
ANALYZE public.heap01;

COPY (select * from pg_statistic where starelid = 'public.heap01'::regclass) TO '/tmp/heap01.stat';
DELETE FROM pg_statistic where starelid = 'public.heap01'::regclass;
COPY pg_statistic from '/tmp/heap01.stat';

Important note: Yes, it's known and strongly recommended to not touch
the `pg_statistics` or any other catalog table this way. But it's no
good to panic either. The copy to `pg_statictics` is going to ERROR
out "correctly" and not crash after this change with `cannot accept a
value of type anyarray`, as there just isn't any way at the SQL level
to insert data into pg_statistic's anyarray columns. Refer:
https://www.postgresql.org/message-id/12138.1277130186%40sss.pgh.pa.us

6793882b

25 9月, 2019 4 次提交

docs: add active directory kerberos steps for pxf (#7055) · 8abac045

由 StanleySung 提交于 9月 25, 2019

* add ad steps in pxf krb doc

* From Lisa Owen

* distributing keytab using gpscp and gpssh

* Update gpdb-doc/markdown/pxf/pxf_kerbhdfs.html.md.erb
Co-Authored-By: NAlexander Denissov <denalex@users.noreply.github.com>

* Update gpdb-doc/markdown/pxf/pxf_kerbhdfs.html.md.erb
Co-Authored-By: NAlexander Denissov <denalex@users.noreply.github.com>

* misc formatting edits

* a few more formatting edits

8abac045

Do not prefer libedit over readline for tab completion and history · 3827c3a6

由 Georgios Kokolatos 提交于 9月 25, 2019

Tab completion does not work for centos 7 as reported in #8575
[https://github.com/greenplum-db/gpdb/issues/8575] and installing from the rpms.
The root cause is that the binary is linked against libedit (v 0x402) which is
providing the necessary `rl_line_buffer` yet with different contents from
readline. In libedit, the variable contains the last flushed entry in the
history file whereas in readline it contains the current line in the interactive
terminal.

It is not necessary to link with libedit on the build packages so the flag is
removed.
Reviewed-by: NBradford D. Boyle <bboyle@pivotal.io>

3827c3a6

L
edits to AGGREGATE function variables (#8653) · 0a12d83a
由 Lena Hunter 提交于 9月 24, 2019
```
* edits to AGGREGATE variables

* removing unnessesary words

* removed STATEFUNC from example
```
0a12d83a
A
Revert "Fix issue for "grant all on all tables in schema xxx to yyy;"" · c7eb9abb
由 Adam Berlin 提交于 9月 24, 2019
```
This issue was causing the build pipeline to go red. Reverting for now.

This reverts commit ba6148c6.
```
c7eb9abb

24 9月, 2019 17 次提交

Fix issue for "grant all on all tables in schema xxx to yyy;" · ba6148c6

由 Fenggang 提交于 9月 24, 2019

It has been discovered in GPDB v.6 and above that a 'GRAND ALL ON ALL TABLES IN
SCHEMA XXX TO YYY;' statement will lead to PANIC.

From the resulted coredumps, a now obsolete code in QD that tried to encode
objects in a partition reference into RangeVars was identified as the culprit.
The list that the resulting vars were ancored, was expecting and treating only
StrVars. The original code was added following the premise that catalog
informations were not available in Segments. Also it tried to optimise caching,
yet the code was not fully writen.

Instead, the offending block is removed which solves the issue and allows for
greater alignment with upstream.
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

ba6148c6

Remove obsolete comment about SubPlan duplication. · 38c52e09

由 Heikki Linnakangas 提交于 9月 24, 2019

I did, in fact, add a test for that case in the previous commit, so the
comment that we couldn't repro it was not accurate.

38c52e09

Allow multiple SubPlan references to a single subplan. · 50239de1

由 Heikki Linnakangas 提交于 9月 24, 2019

In PostgreSQL, there can be multiple SubPlan expressions referring to the
outputs of the same subquery, but this mechanism had been lobotomized in
GPDB. There was a pass over the plan tree, fixup_subplans(), that
duplicated any subplans that were referred more than once, and the rest
of the GPDB planner and executor code assumed that there is only one
reference to each subplan. Refactor the GPDB code, mostly cdbparallelize(),
to remove that assumption, and stop duplicating SubPlans.

* In cdbparallelize(), instead of immediately recursing into the plan tree
  of each SubPlan, process the subplan list in glob->subplans as a separate
  pass. Add a new 'recurse_into_subplans' argument to plan_tree_walker() to
  facilitate that; all other callers pass 'true' so that they still recurse.

* Replace the SubPlan->qDispSliceId and initPlanParallel fields with a new
  arrays in PlannerGlobal.

* In FillSliceTable(), keep track of which subplans have already been
  recursed into, and only recurse on first encounter. (I've got a feeling
  that the executor startup is doing more work than it should need to,
  to set up the slice table. The slice information is available when the
  plan is built, so why does the executor need to traverse the whole plan
  to build the slice table? But I'll leave refactoring that for another
  day..)

* Move the logic to remove unused subplans into cdbparallelize(). This
  used to be done as a separate pass from standard_planner(), but after
  refactoring cdbparallelize(), it is now very convenient and logical to do
  the unused subplan removal there, too.

* Early in the planner, wrap SubPlan references in PlaceHolderVars. This
  is needed in case a SubPlan reference gets duplicated to two different
  slices. A single subplan can only be executed from one slice, because
  the motion nodes in the subplan are set up to send to a particular parent
  slice. The PlaceHoldeVar makes sure that the SubPlan is evaluated only
  once, and if it's needed above the bottommost Plan node where it's
  evaluated, its values is propagated to the upper Plan nodes in the
  targetlists.

There are many other plan tree walkers that still recurse to subplans from
every SubPlan reference, but AFAICS recursing twice is harmless for all
of them. Would be nice to refactor them, too, but I'll leave that for
another day.
Reviewed-by: NBhuvnesh Chaudhary <bhuvnesh2703@gmail.com>
Reviewed-by: NRichard Guo <riguo@pivotal.io>

50239de1

Replace nodeSubplan cmockery test with a fault injection case. · dc345256

由 Heikki Linnakangas 提交于 9月 24, 2019

The mock setup in the old test was very limited, the Node structs it set
up were left to zeros, and even allocated with incorrect lengths (SubPlan
vs SubPlanState). It worked just enough for the codepath that it was
testing, but IMHO it's better to test the error "in vivo", and it requires
less setup, too. So remove the mock test, and replace with a fault
injector test that exercises the same codepath.

dc345256

Handle partTabTargetlist in plan_tree_walker() and plan_tree_mutator(). · e00da975

由 Heikki Linnakangas 提交于 9月 24, 2019

I thought that after adding the fixup_subplans() pass in commit d0aea184,
we shouldn't need the check for subplans in FindEqKey, which was added in
commit 9d63d3c1, as long as we had the walker/mutator support. But for
some reason, the regression test added in commit 9d63d3c1 passes if the
contains_subplan() check is removed, ever without the walker/mutator
support, so I'm not sure where exactly that case is still blocked. But in
any case, let's be tidy, even if there is no ill-effect at the moment.

The missing walker/mutator support was noted by @hsyuan in the comments on
PR #2444 already, but we didn't act on it then.

e00da975

Quick fixup for commit · 4354f28c

由 Paul Guo 提交于 9月 24, 2019

That commit is "Necessary legal steps for python subprocess32 shipping."
Forgot to install the required file. Hmm...

4354f28c

P

Necessary legal steps for python subprocess32 shipping. · a8090c13
由 Paul Guo 提交于 9月 24, 2019

a8090c13

Use root's stat info instead of largest child's. · 8ca6c8d1

由 Zhenghua Lyu 提交于 9月 24, 2019

Currently, for partition table we have maintained some
stat info for root table if the GUC optimizer_analyze_root_partition
is set so that we could use root's stat info directly.

Previously we use largest child's stat info for root partition.
This may lead to serious issue. Consider a partition table t,
all data with null partition key goes into default partition and
it happens to be the largest child. Then for the result size of the
query that t join other table on partition key we will estimate 0
because we use the default partition's stat info which contains all
null partition key. What is worse, we may broadcast the join result.

This commit fixes this issue but leave some future work to do:
maintain STATISTIC_KIND_MCELEM and STATISTIC_KIND_DECHIST for root
table. This commit sets the GUC gp_statistics_pullup_from_child_partition
to false defaultly. Now the whole logic is:
  * if gp_statistics_pullup_from_child_partition is true, we try to
    use largest child's stat
  * if gp_statistics_pullup_from_child_partition is false, we first
    try to fetch root's stat:
      - if root contains stat info, that's fine, we just use it
      - otherwise, we still try to use largest child's stat
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>

8ca6c8d1

Omit slice information for SubPlans that are not dispatched separately. · 96c6d318

由 Heikki Linnakangas 提交于 9月 24, 2019

Printing the slice information makes sense for Init Plans, which are
dispatched separately, before the main query. But not so much for other
Sub Plans, which are just part of the plan tree; there is no dispatching
or motion involved at such SubPlans. The SubPlan might *contain* Motions,
but we print the slice information for those Motions separately. The slice
information was always just the same as the parent node's, which adds no
information, and can be misleading if it makes the reader think that there
is inter-node communication involved in such SubPlans.

96c6d318

T
Update the build artifacts path for coverity pipeline · f57a3694
由 Tingfang Bao 提交于 9月 24, 2019
```
Authored-by: NTingfang Bao <bbao@pivotal.io>
```
f57a3694
T
Update the build artifacts path for gpdb_prd pipeline · 69ce9aee
由 Tingfang Bao 提交于 9月 24, 2019
```
Authored-by: NTingfang Bao <bbao@pivotal.io>
```
69ce9aee
T

fix resource definition typo · d2548d1c
由 Tingfang Bao 提交于 9月 24, 2019

d2548d1c

Update the gpdb internal build artifacts path (#8678) · e106872b

由 Tingfang Bao 提交于 9月 24, 2019

In order to maintain the gpdb build process better.
  gp-releng re-organize the build artifacts storage.

  Only the artifacts path changed, the content is still
  the same as before.
Authored-by: NTingfang Bao <bbao@pivotal.io>

e106872b

Avoid gp_tablespace_with_faults test failure by pg_switch_xlog() · efd76c4c

由 Ashwin Agrawal 提交于 9月 23, 2019

gp_tablespace_with_faults test writes no-op record and waits for
mirror to replay the same before deleting the tablespace
directories. This step fails sometime in CI and causes flaky
behavior. The is due to existing code behavior in startup and
walreceiver process. If primary writes big (means spanning across
multiple pages) xlog record, flushes only partial xlog record due to
XLogBackgroundFlush() but restarts before commiting the transaction,
mirror only receives partial record and waits to get complete
record. Meanwhile after recover, no-op record gets written in place of
that big record, startup process on mirror continues to wait to
receive xlog beyond previously received point to proceed further.

Hence, as temperory workaround till the actual code problem is not
resolved and to avoid failures for this test, switch xlog before
emitting no-op xlog record, to have no-op record at far distance from
previously emitted xlog record.

efd76c4c

L

docs - replace incorrect refs to PXF_HOME with PXF_CONF (#8663) · f02c136c
由 Lisa Owen 提交于 9月 23, 2019

f02c136c

Fix CTAS with gp_use_legacy_hashops GUC · 9040f296

由 Jimmy Yih 提交于 9月 10, 2019

When gp_use_legacy_hashops GUC was set, CTAS would not assign the
legacy hash class operator to the new table. This is because CTAS goes
through a different code path and uses the first operator class of the
SELECT's result when no distribution key is provided.

9040f296

A
Remove plan->nMotionNodes · 00326ca2
由 Ashuka Xue 提交于 9月 23, 2019
```
After commit 1c2489d0 nMotionNodes are no longer part of the plan
struct.
```
00326ca2

23 9月, 2019 8 次提交

Make estimate_hash_bucketsize MPP-correct · d6a567b4

由 Zhenghua Lyu 提交于 9月 23, 2019

In Greenplum, when estimating costs, most of the time we are
in a global view, but sometimes we should shift to a local
view. Postgres does not suffer from this issue because everything
is in one single segment.

The function `estimate_hash_bucketsize` is from postgres and
it plays a very important role in the cost model of hash join.
It should output a result based on locally view. However, the
input parameters like, rows in a table, and ndistinct of the
relation, are all taken from a global view (from all segments).
So, we have to do some compensation for it. The logic is:
  1. for broadcast-like locus, the global ndistinct is the same
     as the local one, we do the compensation by `ndistinct*=numsegments`.
  2. for the case that hash key collcated with locus, on each
     segment, there are `ndistinct/numsegments` distinct groups, so
     no need to do the compensation.
  3. otherwise, the locus has to be partitioned and not collocated with
     hash keys, for these cases, we first estimate the local distinct
     group number, and then do do the compensation.
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>

d6a567b4

Refactor away add_slice_to_motion() function. · 08553cc4

由 Heikki Linnakangas 提交于 9月 23, 2019

The function did completely different things for different callers, so
seems better to move the logic to the callers instead.
Reviewed-by: NAdam Lee <ali@pivotal.io>
Reviewed-by: NNing Yu <nyu@pivotal.io>

08553cc4

Remove separate Query argument, it's the same as root->parse. · 0f7a9b47

由 Heikki Linnakangas 提交于 9月 23, 2019

Remove the Query argument from cdbparallelize(), and its apply_motion()
subroutine. Like most planner functions, these functions are passed a
"PlannerInfo root" which represents the query, and its Query struct is
available at root->parse. Passing a separate Query is confusing because you
might think that you could pass some different query, perhaps a subquery.

0f7a9b47

H
Handle VALUES expressions in plan_tree_mutator() · 88c99773
由 Heikki Linnakangas 提交于 9月 23, 2019
```
Fixes github issue https://github.com/greenplum-db/gpdb/issues/8621
```
88c99773

Remove nMotionNodes and nInitPlans from Plan struct. · 1c2489d0

由 Heikki Linnakangas 提交于 9月 23, 2019

It was quite silly to have them in the Plan struct, which all the plan
nodes "inherit", when the fields were actually only used in the topmost
node in a plan tree. The sillyness was noted in the comments, along with
"Someday, find a better place to keep it". Today is that day.

In the executor, the natural place for these is the PlannedStmt struct.
PlannedStmt contains information for the plan tree as a whole, and in fact,
we already had copies of the fields there, we were just not always using
them! PlannedStmt is only build in the last steps of planning, though.
During planning, stash them PlannerGlobal, like many other fields that
are finally copied to PlannedStmt.

There was one little wrinkle in this plan: there was a check in
EvalPlanQual, which checked that EvalPlanQual is not used on a Plan node
that has any Motions in its subtree. Move that check to ExecInitMotion().

1c2489d0

Set error code for "incompatible loci in target inheritance set" error. · e59aff50

由 Heikki Linnakangas 提交于 9月 23, 2019

There is a test case that reaches this, in the 'file_fdw' test. With
ERRCODE_INTERNAL_ERROR, the error message includes the source file location
(planner.c:1513) in the error message. That's problematic, because the line
number changes whenever we touch planner.c.

Since this error is in fact reachable, mark it as FEATURE_NOT_SUPPORTED.

e59aff50

H

Remove unused 'sliceTable' field from Plan struct. · 0a6312a1
由 Heikki Linnakangas 提交于 9月 23, 2019

0a6312a1

Replace planIsParallel by checking Plan->dispatch flag. · c1851b62

由 Heikki Linnakangas 提交于 9月 23, 2019

Commit 7d74aa55 introduced a new function, planIsParallel() to check
whether the main plan tree needs the interconnect, by checking whether
it contains any Motion nodes. However, we already determine that, in
cdbparallelize(), by setting the Plan->dispatch flag. We were just not
checking it when deciding whether the interconnect needs to be set up.
Let's just check the 'dispatch' flag, like we did earlier in the
function, instead of introducing another way of determining whether
dispatching is needed.

I'm about to get rid of the Plan->nMotionNodes field soon, which is why
I don't want any new code to rely on it.

c1851b62

21 9月, 2019 1 次提交

Enable Init Plans in queries executed locally in QEs. · 98c8b550

由 Heikki Linnakangas 提交于 9月 21, 2019

I've been wondering for some time why we have disabled constructing Init
Plans in queries that are planned in QEs, like in SPI queries that run in
user-defined functions. So I removed the diff vs upstream in
build_subplan() to see what happens. It turns out it was because we always
ran the ExtractParamsFromInitPlans() function in QEs, to get the InitPlan
values that the QD sent with the plan, even for queries that were not
dispatched from the QD but planned locally. Fix the call in InitPlan to
only call ExtractParamsFromInitPlans() for queries that were actually
dispatched from the QD, and allow QE-local queries to build Init Plans.

Include a new test case, for clarity, even though there were some existing
ones that incidentally covered this case.

98c8b550