提交 · 2168ecc53e688ccae56cf6b64a647a27deefd1fa · Greenplum / Gpdb

07 9月, 2017 5 次提交

Error out when self-ref set operation in recursive term · 2168ecc5

由 Kavinder Dhaliwal 提交于 7月 11, 2017

This commit ensures that if there is ever a self reference to a
recursive cte within a set operation in the recursive term an error will
be produced

For example

WITH RECURSIVE x(n) AS (
	SELECT 1
	UNION ALL
	SELECT n+1 FROM (SELECT * FROM x UNION SELECT * FROM z)foo)
SELECT * FROM x;

Will produce an error, while

WITH RECURSIVE x(n) AS (
	SELECT 1
	UNION ALL
	SELECT n+1 FROM (SELECT * from z UNION SELECT * FROM u)foo, x where foo.x = x.n)
SELECT * FROM x;

Will not because the set operation does not have a self reference to its
cte.

2168ecc5

Bring in recursive CTE to GPDB · fd61a4ca

由 Haisheng Yuan 提交于 6月 13, 2017

Planner generates plan that doesn't insert any motion between WorkTableScan and
its corresponding RecursiveUnion, because currently in GPDB motions are not
rescannable. For example, a MPP plan for recursive CTE query may look like:
```
Gather Motion 3:1
   ->  Recursive Union
         ->  Seq Scan on department
               Filter: name = 'A'::text
         ->  Nested Loop
               Join Filter: d.parent_department = sd.id
               ->  WorkTable Scan on subdepartment sd
               ->  Materialize
                     ->  Broadcast Motion 3:3
                           ->  Seq Scan on department d
```

For the current solution, the WorkTableScan is always put on the outer side of
the top most Join (the recursive part of RecusiveUnion), so that we can safely
rescan the inner child of join without worrying about the materialization of a
potential underlying motion. This is a heuristic based plan, not a cost based
plan.

Ideally, the WorkTableScan can be placed on either side of the join with any
depth, and the plan should be chosen based on the cost of the recursive plan
and the number of recursions. But we will leave it for later work.

Note: The hash join is temporarily disabled for plan generation of recursive
part, because if the hash table spills, the batch file is going to be removed
as it executes. We have a following story to enable spilled hash table to be
rescannable.

See discussion at gpdb-dev mailing list:
https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/s_SoXKlwd6I

fd61a4ca

gp_era: change usage from md5 to sha256 · c13a9177

由 Marbin Tan 提交于 9月 05, 2017

There is a bug with python 2.7 where you can't use hashlib.md5() with a
system that has fips mode on. python 2.7 will segfault if you run the
following
`python -c "import ssl; import hashlib; m = hashlib.md5(); m.update('abc');"`

Use sha256 instead as a workaround of the python 2.7 md5 issue.

gp_era saves the hashed value into a file which gets read when creating
a new mirror. It's mainly used to see if any segments gets out of
synced with the new era file.

c13a9177

Add missing subselect test case with CTE [#150338742] · 4765e971

由 Haisheng Yuan and Jesse Zhang 提交于 9月 06, 2017

Commit 038c36b6 from Postgres 8.3 was
merged into Greenplum in a453004e. Commit 038c36b6 is a partial back
port of commit 688aafa1 from Postgres 8.4. What's partial about 038c36b6
is the omission of a test case containing CTE: a whole-row variable can
refer to either an aliased `FROM` clause, or it can refer to a CTE. The
CTE case was omitted because upstream 8.3 didn't have CTE.

The non-CTE test case was slightly modified to add an `ORDER BY` clause
because atmsort is confused by the `ORDER BY` inside the subselect:
semantically we expect the differ to canonicalize (sort) the output
before comparison, because sorted order of a subselect is not preserved
according to SQL standard, but in this case atmsort believes the output
is already sorted (by virtue of the presence of `ORDER BY`, even though
it's within the subselect).

Original commit message of 688aafa1 is enclosed:

> Fix whole-row Var evaluation to cope with resjunk columns (again).
>
> When a whole-row Var is reading the result of a subquery, we need it to
> ignore any "resjunk" columns that the subquery might have evaluated for
> GROUP BY or ORDER BY purposes.  We've hacked this area before, in commit
> 68e40998, but that fix only covered
> whole-row Vars of named composite types, not those of RECORD type; and it
> was mighty klugy anyway, since it just assumed without checking that any
> extra columns in the result must be resjunk.  A proper fix requires getting
> hold of the subquery's targetlist so we can actually see which columns are
> resjunk (whereupon we can use a JunkFilter to get rid of them).  So bite
> the bullet and add some infrastructure to make that possible.
>
> Per report from Andrew Dunstan and additional testing by Merlin Moncure.
> Back-patch to all supported branches.  In 8.3, also back-patch commit
> 292176a1, which for some reason I had
> not done at the time, but it's a prerequisite for this change.

(cherry picked from commit 688aafa15d8d83077c686d2b5b88226528e29840)

4765e971

T
Adds python to PATH for macOS directions · e996c540
由 Todd Sedano 提交于 9月 01, 2017
```
Solves https://github.com/pivotal/workstation-setup/issues/144
```
e996c540

06 9月, 2017 13 次提交

Ensure that stable functions in a prepared statement are re-evaluated. · ccca0af2

由 Heikki Linnakangas 提交于 9月 06, 2017

If a prepared statement, or a cached plan for an SPI query e.g. from a
PL/pgSQL function, contains stable functions, the stable functions were
incorrectly evaluated only once at plan time, instead of on every execution
of the plan. This happened to not be a problem in queries that contain any
parameters, because in GPDB, they are re-planned on every invocation
anyway, but non-parameter queries were broken.

In the planner, before this commit, when simplifying expressions, we set
the transform_stable_funcs flag to true for every query, and evaluated all
stable functions at planning time. Change it to false, and also rename it
back to 'estimate', as it's called in the upstream. That flag was changed
back in 2010, in order to allow partition pruning to work with qual
containing stable functions, like TO_DATE. I think back then, we always
re-planned every query, so that was OK, but we do cache plans now.

To avoid regressing to worse plans, change eval_const_expressions() so that
it still does evaluate stable functions, even when the 'estimate' flag is
off. But when it does so, mark the plan as "one-off", meaning that it must
be re-planned on every execution. That gives the old, intended, behavior,
that such plans are indeed re-planned, but it still allows plans that don't
use stable functions to be cached.

This seems to fix github issue #2661. Looking at the direct dispatch code
in apply_motion(), I suspect there are more issues like this lurking there.
There's a call to planner_make_plan_constant(), modifying the target list
in place, and that happens during planning. But this at least fixes the
non-direct dispatch cases, and is a necessary step for fixing any remaining
issues.

For some reason, the query now gets planned *twice* for every invocation.
That's not ideal, but it was an existing issue for prepared statements with
parameters, already. So let's deal with that separately.

ccca0af2

Fix reuse of cached plans in user-defined functions. · 2f4d8554

由 Heikki Linnakangas 提交于 9月 06, 2017

CdbDispatchPlan() was making a copy of the plan tree, in the same memory
context as the old plan tree was in. If the plan came from the plan cache,
the copy will also be stored in the CachedPlan context. That means that
every execution of the cached plan will leak a copy of the plan tree in
the long-lived memory context.

Commit 8b693868 fixed this for cached plans being used directly with
the extended query protocol, but it did not fix the same issue with plans
being cached as part of a user-defined function. To fix this properly,
revert the changes to exec_bind_message, and instead in CdbDispatchPlan,
make the copy of the plan tree in a short-lived memory context.

Aside from the memory leak, it was never a good idea to change the original
PlannedStmt's planTree pointer to point to the modified copy of the plan
tree. That copy has had all the parameters replaced with their current
values, but on the next execution, we should do that replacement again. I
think that happened to not be an issue, because we had code elsewhere that
forced re-planning of all queries anyway. Or maybe it was in fact broken.
But in any case, stop scribbling on the original PlannedStmt, which might
live in the plan cache, and make a temporary copy that we can freely
scribble on in CdbDispatchPlan, that's only used for the dispatch.

2f4d8554

K

Remove @skip since test is now functioning correctly · 2f15ab8c
由 Kavinder Dhaliwal 提交于 8月 31, 2017

2f15ab8c

Refactor the way seqserver host and port are stored. · 208a3cad

由 Heikki Linnakangas 提交于 9月 05, 2017

They're not really per-portal settings, so it doesn't make much sense
to pass them to PortalStart. And most of the callers were passing
savedSeqServerHost/Port anyway. Instead, set the "current" host and port
in postgres.c, when we receive them from the QD.

208a3cad

H
Remove useless system_catalog TINC tests. · 0e9380b3
由 Heikki Linnakangas 提交于 9月 05, 2017
```
All of these queries were wrapped in gpdiff ignore-blocks. What's the
point?
```
0e9380b3

Mark Abort/Commit/Transaction as static again. · 5fac1a58

由 Heikki Linnakangas 提交于 9月 05, 2017

We don't care about old versions of dtrace anymore. Revert the code to
the way it's in the upstream, to reduce our diff footprint.

5fac1a58

C

Migrate backup_43_restore_5 test from pulse to terraform (#3136) · 66842386
由 Chris Hajas 提交于 9月 05, 2017

66842386
J
Add migrated cs_walrep CCP tests to pipeline ALL group · cdba4245
由 Jimmy Yih 提交于 9月 05, 2017
```
[ci skip]
```
cdba4245
J

Migrate cs-walrepl-multinode from Pulse to CCP · 1b960a73
由 Jimmy Yih 提交于 9月 01, 2017

1b960a73
J
Reorder TINC walrep_2 to fix ordering test failure · 1a6797e9
由 Jimmy Yih 提交于 9月 01, 2017
```
Also remove some useless Makefile targets.
```
1a6797e9

Add TINC support with CCP · 2aea56b7

由 Jimmy Yih 提交于 8月 30, 2017

TINC tests are planned to be migrated over to run natively in
Concourse using CCP. This commit adds the task and script files needed
to create the new TINC jobs.

2aea56b7

Don't initialize random seed when creating a temporary file. · be894afd

由 Heikki Linnakangas 提交于 9月 05, 2017

That seems like a very random place to do it (sorry for the pun). The
random seed is initialized at backend startup anyway, that ought to be
good enough, so just remove the spurious initialization from bfz.c.

In the passing, improve the debug-message to mention which compression
algorithm was used.

be894afd

Remove unnecessary parse-analysis error position callback. · b325dc8e

由 Heikki Linnakangas 提交于 9月 05, 2017

I guess once upon a time this was needed to get better error messages,
with error positions, but we rely on the 'location' fields in the parse
nodes nowadays. Removing this doesn't affect any of the error messages
memorized in the regression tests, so it's not needed anymore.

b325dc8e

05 9月, 2017 8 次提交

H

Rename and move pg_proc.proiswin column to where it's in upstream. · b0c5e9c2
由 Heikki Linnakangas 提交于 9月 05, 2017

b0c5e9c2
J

Disable PXF in ORCA CI · b1a21354
由 Jesse Zhang 提交于 9月 05, 2017

b1a21354
H

Remove some unnecessary #includes. · 1b751b46
由 Heikki Linnakangas 提交于 9月 05, 2017

1b751b46

Simplify tuple serialization in Motion nodes. · cedd89bf

由 Ning Yu 提交于 9月 05, 2017

* Simplify tuple serialization in Motion nodes.

There is a fast-path for tuples that contain no toasted attributes,
which writes the raw tuple almost as is. However, the slow path is
significantly more complicated, calling each attribute's binary
send/receive functions (although there's a fast-path for a few
built-in datatypes). I don't see any need for calling I/O functions
here. We can just write the raw Datum on the wire. If that works
for tuples with no toasted attributes, it should work for all tuples,
if we just detoast any toasted attributes first.

This makes the code a lot simpler, and also fixes a bug with data
types that don't have a binary send/receive routines. We used to
call the regular (text) I/O functions in that case, but didn't handle
the resulting cstring correctly.

Diagnosis and test case by Foyzur Rahman.
Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>
Signed-off-by: NNing Yu <nyu@pivotal.io>

cedd89bf

Bump catalog version number. · bbf284d1

由 Heikki Linnakangas 提交于 9月 05, 2017

While we don't bother doing this on every commit that makes
catalog-incompatible changes, let's at least do it one time, now that
5X_STABLE has been branched. We ought to have done this at the time that
the branch was created, but better late than never.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/ybFrXcIWNOQ/lx1JSdvcAAAJ

bbf284d1

gpload hang due to signal handler call un-reentrant function (#3097) · 118ab14f

由 Weinan WANG 提交于 9月 05, 2017

Some non-reentrant functions are invoked in the signal handler.

To fix this bug: change signal handler to the asynchronous model.

using global variable "sig_flag" to store last signal state, every 1s polling or
after failed happen in block IO function(such as send/ receive) check "sig_flag".

118ab14f

H
Fix a few missing initializations of 'location' fields. · ae78a6b6
由 Heikki Linnakangas 提交于 9月 04, 2017
```
These are just pro forma, as the location field isn't used for anything
after parse analysis, but let's be tidy.
```
ae78a6b6
H
Move current distributed transaction ID out of the per-subxact struct. · a4fa85a9
由 Heikki Linnakangas 提交于 9月 04, 2017
```
Only a top-level transaction can have a distributed transaction ID, so
this seems more logical.
```
a4fa85a9

04 9月, 2017 14 次提交

Use GetOptions for options parsing in get_ereport · c637a0d0

由 Daniel Gustafsson 提交于 9月 04, 2017

When adding the GPTest version printing it become clear that not
only was the existing version printing broken, the options parsing
was too. See sample execution below:

  ./get_ereport.pl -version
  Use of uninitialized value $ARGV[0] in pattern match (m//) at ./get_ereport.pl line 99.
  Missing argument in sprintf at ./get_ereport.pl line 163.
  ./get_ereport.pl version 0.

So while in there, this commit fixes both. The options are now
properly parsed with GetOptions() using pass_through and the version
printed using the GPTest module.

c637a0d0

Move version printing to common module for Perl code · 7d64740b

由 Daniel Gustafsson 提交于 9月 04, 2017

The perl code in src/test/regress was using a mix of either not
printing the version, printing it wrong (due to us not using CVS
anymore) or using a hardcoded string. Implement a new module for
common test code called GPTest.pm which abstracts this (for now
it's the only thing it does but this might/will change, hence the
name). The module is created by autoconf to make it pull in the
GP_VERSION from there.

While there, simplify the version output in gpdiff which included
the version of the system diff command - somewhat uninteresting
information as it's not something that changes very often and just
cluttered up the output.

This removes the MakeMaker support but since we have no intention
of packaging these programs into a CPAN module it seems pointless
to carry that format around.

7d64740b

Refactor Greenplum specific testcode to a new file · 01304a76

由 Daniel Gustafsson 提交于 9月 04, 2017

regress.c is an upstream file, and all Greenplum additions can
cause conflicts as we merge with PostgreSQL. This refactors all
GPDB specific code into a new file, regress_gp.c, to keep the
upstream file as close to upstream as possible (with backports).
The new file gets compiled and loaded just like regress.c, so
no change in how it works.

Also remove an unused function, perform rudimentary codereview
on the Greenplum tests and massage regress.c slightly to make
it closer to upstream.

01304a76

Refactor copy's target segment computing function · 36f2f6d6

由 Xiaoran Wang 提交于 8月 24, 2017

There are same codes computing target segment in both function CopyFrom
and CopyFromDispatch. Extract the codes into separate functions.
Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>

36f2f6d6

Share external URL-mapping code between planner and ORCA. · cbb8ea18

由 Heikki Linnakangas 提交于 9月 04, 2017

Planner and ORCA translator both implemented the same logic, to assign
external table URIs to segments. But I spotted one case where the logic
differed:

CREATE EXTERNAL TABLE exttab_with_on_master( i int, j text )
LOCATION ('file://@hostname@@abs_srcdir@/data/exttab_few_errors.data') ON MASTER FORMAT 'TEXT' (DELIMITER '|');

SELECT * FROM exttab_with_on_master;
ERROR:  'ON MASTER' is not supported by this protocol yet.

With ORCA you got a less user-friendly error:

set optimizer=on;
set optimizer_enable_master_only_queries = on;
postgres=# explain SELECT * FROM exttab_with_on_master;
ERROR:  External scan error: Could not assign a segment database for external file (CTranslatorDXLToPlStmt.cpp:472)

The immediate cause of that was that commit fcf82234 didn't remember to
modify the ORCA translator's copy of the same logic. But really, it's silly
and error-prone to duplicate the code, so modify ORCA to use the same code
that the planner does.

cbb8ea18

Further refactoring of ParseFuncOrColumn and func_get_detail. · 5a7563cc

由 Heikki Linnakangas 提交于 9月 04, 2017

This backports the new FUNCDETAIL_WINDOWFUNC return code from PostgreSQL
8.4, and refactors the code to match upstream, as much as feasible. A few
error scenarios now give better error messages.

5a7563cc

D

Fix typo in copy.c comment · 102aac6f
由 Daniel Gustafsson 提交于 9月 04, 2017

102aac6f
H
Replace custom expandable buffer implementation with StringInfo. · 38f354aa
由 Heikki Linnakangas 提交于 9月 04, 2017
```
Simpler that way.
```
38f354aa

Replace redundant functions with contain_window_function() from PG 8.4 · 232ecfc3

由 Heikki Linnakangas 提交于 9月 04, 2017

We don't need two different functions to check whether an expression
contains a window function. Replace both with the variant used in
the upstream, contain_window_function().

232ecfc3

Cherry-pick locate_windowfunc() from PostgreSQL 8.4. · 3fc342d6

由 Heikki Linnakangas 提交于 9月 04, 2017

This allows having error positions for more syntax errors, and reduces
the diff footprint of our window functions implementation against the
one in PostgreSQL 8.4.

3fc342d6

Handle the failure in AssignResGroupOnMaster() · 931d5d57

由 xiong-gang 提交于 9月 04, 2017

As AssignResGroupOnMaster() is called before the transaction is
actually started, so the failure won't cause transaction abort,
we need handle the error to prevent slot leaking.
Signed-off-by: NZhenghua Lyu <zlv@pivotal.io>

931d5d57

H
Cosmetic fixes, to reduce diff vs upstream. · 74fdbc5d
由 Heikki Linnakangas 提交于 9月 03, 2017
```
Most notably, move the definition of XmlExpr and friends to where they are
in the upstream.
```
74fdbc5d
H
Rename checkExprHasWindFuncs to checkExprHasWindowFuncs to match upstream. · e94a339a
由 Heikki Linnakangas 提交于 9月 03, 2017
```
Also move the function to where it is in the upstream.

To reduce our diff footprint.
```
e94a339a
H
Remove overly-complicated SzAllocate function. · 4212fdad
由 Heikki Linnakangas 提交于 9月 03, 2017
```
There was only one caller, and it provided no memory pool. The fault
injection was also unused AFAICS.
```
4212fdad