提交 · ad16656353c8c9382ac664abe1128ecbcea1826d · Greenplum / Gpdb

21 9月, 2017 1 次提交

Fix multistage aggregation plan targetlists · ad166563

由 Bhuvnesh Chaudhary 提交于 9月 19, 2017

If there are aggregation queries with aliases same as the table actual
columns and they are propagated further from subqueries and grouping is
applied on the column alias it may result in inconsistent targetlists
for aggregation plan causing crash.

	CREATE TABLE t1 (a int) DISTRIBUTED RANDOMLY;
	SELECT substr(a, 2) as a
	FROM
		(SELECT ('-'||a)::varchar as a
			FROM (SELECT a FROM t1) t2
		) t3
	GROUP BY a;

ad166563

20 9月, 2017 1 次提交

Cherry-pick psprintf() function from upstream, and use it. · 12c7f256

由 Heikki Linnakangas 提交于 8月 15, 2017

This makes constructing strings a lot simpler, and less scary. I changed
many places in GPDB code to use the new psprintf() function, where it
seemed to make most sense. A lot of code remains that could use it, but
there's no urgency.

I avoided changing upstream code to use it yet, even where it would make
sense, to avoid introducing unnecessary merge conflict.

The biggest changes are in cdbbackup.c, where the code to count the buffer
sizes was most really complex. I also refactored the #ifdef USE_DDBOOST
blocks so that there is less repetition between the USE_DDBOOST and
!USE_DDBOOST blocks, that should make it easier to catch bugs at compilation
time, that affect the !USE_DDBOOST case, when compiling with USE_DDBOOST,
and vice versa. I also switched to using pstrdup instead of strdup() in
a few places, to avoid memory leaks. (Although the way cdbbackup works,
it would only get launched once per connection, so it didn't really matter
in practice.)

12c7f256

14 9月, 2017 1 次提交
- R
  Verify the newval for GUC 'statement_mem' and 'max_resource_groups' only if... · 7731b964
  由 Richard Guo 提交于 9月 14, 2017
```
Verify the newval for GUC 'statement_mem' and 'max_resource_groups' only if they are actually being set.
```
  7731b964
12 9月, 2017 2 次提交

Fix wrong results for NOT-EXISTS sublinks with aggs & LIMIT · d8c7b947

由 Shreedhar Hardikar 提交于 9月 08, 2017

During NOT EXISTS sublink pullup, we create a one-time false filter when
the sublink contains aggregates without checking for limitcount. However
in situations where the sublink contains an aggregate with limit 0, we
should not generate such filter as it produces incorrect results.

Added regress test.

Also, initialize all the members of IncrementVarSublevelsUp_context
properly.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

d8c7b947

Refactor adding explicit distribution motion logic · 75339b9f

由 Bhuvnesh Chaudhary 提交于 9月 10, 2017

nMotionNodes tracks the number of Motion in a plan, and each
plan node maintains nMotionNodes. Counting number of Motions in a plan node by
traversing the tree and adding up nMotionNodes found in nested plans will give
incorrect number of Motion nodes. So instead of using nMotionNodes, use
a boolean flag to track if the subtree tree excluding the initplans
contains a motion node

75339b9f

08 9月, 2017 1 次提交

Handle the failure in AssignResGroupOnMaster() · 87ddb6b8

由 xiong-gang 提交于 9月 04, 2017

As AssignResGroupOnMaster() is called before the transaction is
actually started, so the failure won't cause transaction abort,
we need handle the error to prevent slot leaking.
Signed-off-by: NZhenghua Lyu <zlv@pivotal.io>

87ddb6b8

07 9月, 2017 2 次提交

Force a stand-alone backend to run in utility mode. · abedbc23

由 Heikki Linnakangas 提交于 9月 06, 2017

In a stand-alone backend ("postgres --single"), you cannot realistically
expect any of the infrastructure needed for MPP processing to be present.
Let's force a stand-alone backend to run in utility mode, to make sure
that we don't try to dispatch queries, participate in distributed
transactions, or anything like that, in a stand-alone backend.

Fixes github issue #3172, which was one such case where we tried to
dispatch a SET command in single-user mode, and got all confused.

abedbc23

Bring in recursive CTE to GPDB · 546fa1f6

由 Haisheng Yuan 提交于 6月 13, 2017

Planner generates plan that doesn't insert any motion between WorkTableScan and
its corresponding RecursiveUnion, because currently in GPDB motions are not
rescannable. For example, a MPP plan for recursive CTE query may look like:
```
Gather Motion 3:1
   ->  Recursive Union
         ->  Seq Scan on department
               Filter: name = 'A'::text
         ->  Nested Loop
               Join Filter: d.parent_department = sd.id
               ->  WorkTable Scan on subdepartment sd
               ->  Materialize
                     ->  Broadcast Motion 3:3
                           ->  Seq Scan on department d
```

For the current solution, the WorkTableScan is always put on the outer side of
the top most Join (the recursive part of RecusiveUnion), so that we can safely
rescan the inner child of join without worrying about the materialization of a
potential underlying motion. This is a heuristic based plan, not a cost based
plan.

Ideally, the WorkTableScan can be placed on either side of the join with any
depth, and the plan should be chosen based on the cost of the recursive plan
and the number of recursions. But we will leave it for later work.

Note: The hash join is temporarily disabled for plan generation of recursive
part, because if the hash table spills, the batch file is going to be removed
as it executes. We have a following story to enable spilled hash table to be
rescannable.

See discussion at gpdb-dev mailing list:
https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/s_SoXKlwd6I

546fa1f6

06 9月, 2017 1 次提交

Fix reuse of cached plans in user-defined functions. · 9fc02221

由 Heikki Linnakangas 提交于 9月 06, 2017

CdbDispatchPlan() was making a copy of the plan tree, in the same memory
context as the old plan tree was in. If the plan came from the plan cache,
the copy will also be stored in the CachedPlan context. That means that
every execution of the cached plan will leak a copy of the plan tree in
the long-lived memory context.

Commit 8b693868 fixed this for cached plans being used directly with
the extended query protocol, but it did not fix the same issue with plans
being cached as part of a user-defined function. To fix this properly,
revert the changes to exec_bind_message, and instead in CdbDispatchPlan,
make the copy of the plan tree in a short-lived memory context.

Aside from the memory leak, it was never a good idea to change the original
PlannedStmt's planTree pointer to point to the modified copy of the plan
tree. That copy has had all the parameters replaced with their current
values, but on the next execution, we should do that replacement again. I
think that happened to not be an issue, because we had code elsewhere that
forced re-planning of all queries anyway. Or maybe it was in fact broken.
But in any case, stop scribbling on the original PlannedStmt, which might
live in the plan cache, and make a temporary copy that we can freely
scribble on in CdbDispatchPlan, that's only used for the dispatch.

9fc02221

05 9月, 2017 1 次提交

Simplify tuple serialization in Motion nodes. · 11e4aa66

由 Ning Yu 提交于 9月 05, 2017

* Simplify tuple serialization in Motion nodes.

There is a fast-path for tuples that contain no toasted attributes,
which writes the raw tuple almost as is. However, the slow path is
significantly more complicated, calling each attribute's binary
send/receive functions (although there's a fast-path for a few
built-in datatypes). I don't see any need for calling I/O functions
here. We can just write the raw Datum on the wire. If that works
for tuples with no toasted attributes, it should work for all tuples,
if we just detoast any toasted attributes first.

This makes the code a lot simpler, and also fixes a bug with data
types that don't have a binary send/receive routines. We used to
call the regular (text) I/O functions in that case, but didn't handle
the resulting cstring correctly.

Diagnosis and test case by Foyzur Rahman.
Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>
Signed-off-by: NNing Yu <nyu@pivotal.io>

11e4aa66

01 9月, 2017 3 次提交

Fix Copyright and file headers across the tree · ed7414ee

由 Daniel Gustafsson 提交于 9月 01, 2017

This bumps the copyright years to the appropriate years after not
having been updated for some time. Also reformats existing code
headers to match the upstream style to ensure consistency.

ed7414ee

Set errcode on AO checksum errors · 60f9ac3d

由 Daniel Gustafsson 提交于 9月 01, 2017

The missing errcode makes the ereport call include the line number
of the invocation from the .c file, which not only isn't too useful
but cause the tests to fail on adding/removing code from the file.

60f9ac3d

Always read returnvalue from stat calls · 742e5415

由 Daniel Gustafsson 提交于 8月 31, 2017

{f}stat() can fail and reading the stat buffer without checking for
status is bad hygiene. Ensure to always test return value and take
the appropriate error path in case of stat error.

742e5415

30 8月, 2017 2 次提交

H
Remove misc unused code. · 37d2a5b3
由 Heikki Linnakangas 提交于 8月 30, 2017
```
'nuff said.
```
37d2a5b3

Eliminate '#include "utils/resowner.h"' from lock.h · 6b25c0a8

由 Heikki Linnakangas 提交于 8月 29, 2017

It was getting in the way of backporting commit 9b1b9446f5 from PostgreSQL,
which added an '#include "storage/lock.h"' to resowner.h, forming a cycle.

The include was only needed for the decalaration of awaitedOwner global
variable. Replace "ResourceOwner" with the equivalent "struct
ResourceOwnerData *" to avoid it.

This revealed a bunch of other files that were relying on resowner.h
being indirectly included through lock.h. Include resowner.h directly
in those files.

The ResPortalIncrement.owner field was not used for anything, so instead
of including resowner.h in that file, just remove the field that needed
it.

6b25c0a8

29 8月, 2017 1 次提交

Perform resource group operations only when it's initialized · 939208b5

由 Pengzhou Tang 提交于 8月 23, 2017

The resource group is enabled but not initialized on auxiliary processes
and special backends like ftsprobe and filerep, previously we performed
resource group operations no matter resource group is initialized or not
which leads to some unexpected error.

939208b5

28 8月, 2017 3 次提交

Avoid side effects in assertions · 288dde95

由 Daniel Gustafsson 提交于 8月 28, 2017

An assertion with a side effect may alter the main codepath when
the tree is built with --enable-cassert, which in turn may lead
to subtle differences due compiler optimizations and/or straight
bugs in the side effect. Rewrite the assertions without side
effects to leave the main codepath intact.

288dde95

Optimize `COPY TO ON SEGMENT` result processing · 266355d3

由 Adam Lee 提交于 8月 25, 2017

Don't send nonsense '\n' characters just for counting, let segments
report how many rows are processed instead.
Signed-off-by: NMing LI <mli@apache.org>

266355d3

Check distribution key restriction for `COPY FROM ON SEGMEN` · 65321259

由 Xiaoran Wang 提交于 8月 10, 2017

When use command `COPY FROM ON SEGMENT`, we copy data from
local file to the table on the segment directly. When copying
data, we need to apply the distribution policy on the record to compute
the target segment. If the target segment ID isn't equal to
current segment ID, we will report error to keep the distribution
key restriction.

Because the segment has no meta data info about table distribution policy and
partition policy,we copy the distribution policy of main table from
master to segment in the query plan. When the parent table and
partitioned sub table has different distribution policy, it is difficult
to check all the distribution key restriction in all sub tables. In this
case , we will report error.

In case of the partitioned table's distribution policy is
RANDOMLY and different from the parent table, user can use GUC value
`gp_enable_segment_copy_checking` to disable this check.

Check the distribution key restriction as follows:

1) Table isn't partioned:
    Compute the data target segment.If the data doesn't belong the
    segment, will report error.

2) Table is partitioned and the distribution policy of partitioned table
as same as the main table:
    Compute the data target segment.If the data doesn't belong
    the segment, will report error.

3) Table is partitioned and the distribution policy of partitioned
table is different from main table:
    Not support to check ,report error.
Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
Signed-off-by: NMing LI <mli@apache.org>
Signed-off-by: NAdam Lee <ali@pivotal.io>

65321259

19 8月, 2017 3 次提交

Calculate checksum during persistent table reset_all. · 19082615

由 Ashwin Agrawal 提交于 8月 18, 2017

When `gp_persistent_reset_all()` is called, gp_relation_node index is truncated
and while writing meta-page checksum calculation was missed. Ideally, following
`gp_persistent_build_all()` fixes the same and correctly builds the index and
calculates the checksum file. Better to get proper message like "ERROR: Did not
find gp_relation_node entry for relation name....." on operations after
reset_all than page verification failed.

19082615

J
Add assert checking to "UnderLock" functions · af8dc9c2
由 Jacob Champion 提交于 8月 11, 2017
```
Signed-off-by: NTaylor Vesely <tvesely@pivotal.io>
```
af8dc9c2

Introduce LW locks to protect filespace and tablespace hash tables. · 677c3eac

由 Taylor Vesely 提交于 8月 02, 2017

Adds two new LW locks to protect filespace and tablespace hash tables
to disambiguate them from PersistentObjLock. Previously, PersistentObjLock was
overloaded to both protect these hash tables along with persistent heap tables.
If two backends try to flush the same dirty buffer, a deadlock could
potentially arise in which backend 1 holds PersistentObjLock and requests
io_in_progress lock of the buffer to be evicted. Backend 2 holds
io_in_progress lock on the same buffer and attempts to obtain file path.
Because the file path is in hash tables protected by PersistentObjLock, backend
2 requests PersistentObjLock and blocks due to backend 1.
Signed-off-by: NAsim R P <apraveen@pivotal.io>

677c3eac

18 8月, 2017 1 次提交

Fix check for all-Const target list, in single-row-insert dispatch. · 2b497f09

由 Heikki Linnakangas 提交于 8月 17, 2017

If you have a simple insert, like "INSERT INTO foo VALUES ('bar')", we
evalute the target list (i.e. 'bar') in the master, and route the insert to
the correct partition and segment, based on the constants. However, there
was a mismatch between allConstantValuesClause(), and what its callers
assumed. The callers assumed that if allConstantValuesClause() returns
true, the target list contains only Const nodes. But in reality,
allConstantValuesClause() also returned true, if there were non-volatile
function expressions in the target list, that could be evaluated, and would
then produce a constant result.

Fix the mismatch, by making allConstantValuesClause() be more strict, so
so that it only returns true if all the entries are true Consts.

Fixes github issue #285, reported by @liruto.

2b497f09

17 8月, 2017 1 次提交

Remove unusued Plan.plan_parent_node_id field. · 5c155847

由 Heikki Linnakangas 提交于 8月 17, 2017

This allows removing all the code in CTranslatorDXLToPlStmt that tracked
the parent of each call.

I found the plan node IDs awkward, when I was hacking on
CTranslatorDXLToPlStmt. I tried to make a change where a function would
construct a child Plan node first, and a Result node on top of that, but
only if necessary, depending on the kind of child plan. The parent plan
node IDs made it impossible to construct a part of Plan tree like that, in
a bottom-up fashion, because you always had to pass the parent's ID when
constructing a child node. Now that is possible.

5c155847

12 8月, 2017 1 次提交

Inline helper function. · f29262bd

由 Heikki Linnakangas 提交于 8月 11, 2017

The pattern of palloc'ing a Datum and isnull array is ubiquitous, no point
in hiding it behind a function, especially when the function only has one
caller.

f29262bd

11 8月, 2017 7 次提交
- H
  Create TupleDesc in gp_read_error_log in a more idiomatic way. · 605d5f22
  由 Heikki Linnakangas 提交于 8月 11, 2017
```
This makes the code independent of the exact shape of pg_attribute. That
will save us some headaches when merging pg_attribute changes in the
future.
```
  605d5f22
- H
  
  Use Assert rather than assert in backend code. · 793d5215
  由 Heikki Linnakangas 提交于 8月 11, 2017
  
  793d5215
- H
  Mark functions as static, for clarity. · 0a2a2600
  由 Heikki Linnakangas 提交于 8月 11, 2017
```
They were mostly marked as static in the prototype already, so this is just
for readability.
```
  0a2a2600
- H
  
  Remove unnecessary #includes · 14a983ef
  由 Heikki Linnakangas 提交于 8月 11, 2017
  
  14a983ef
- H
  
  Remove unused fields from interconnect structs. · 8f607376
  由 Heikki Linnakangas 提交于 8月 11, 2017
  
  8f607376
- H
  
  Remove unused debug helper function. · 1fe8f5c2
  由 Heikki Linnakangas 提交于 8月 11, 2017
  
  1fe8f5c2
- H
  
  Remove unused fields from cdbtm's shared memory struct. · 2f803aaf
  由 Heikki Linnakangas 提交于 8月 11, 2017
  
  2f803aaf
09 8月, 2017 7 次提交

Include gp-libpq-int.h in cdbcopy.c · fdb5d6c3

由 Pengzhou Tang 提交于 8月 08, 2017

cf7cddf7 has conflict with cc38f526, struct PQExpBufferData is
needed by structure SegmentDatabaseDescriptor, so bring gp-libpq-int.h back

fdb5d6c3

Do not include gp-libpq-fe.h and gp-libpq-int.h in cdbconn.h · cf7cddf7

由 Pengzhou Tang 提交于 8月 07, 2017

The whole cdb directory was shipped to end users and all header files
that cdb*.h included are also need to be shipped to make checkinc.py
pass. However, exposing gp_libpq_fe/*.h will confuse customer because
they are almost the same as libpq/*, as Heikki's suggestion, we should
keep gp_libpq_fe/* unchanged. So to make system work, we include
gp-libpg-fe.h and gp-libpq-int.h directly in c files that need them

cf7cddf7

Add debug info for interconnect network timeout · 9a9cd48b

由 Pengzhou Tang 提交于 8月 07, 2017

It was very difficult to verify if interconnect is stucked in resending
phase or if there is udp resending latency within interconnect. To improve
it, this commit record a debug message every Gp_interconnect_debug_retry_interval
times when gp_log_interconnect is set to DEBUG.

9a9cd48b

Remove list_find_* functions. · 0340f543

由 Heikki Linnakangas 提交于 8月 08, 2017

They don't exist in the upstream. All but one of the callers actually just
needed list_member_*().

0340f543

H

Remove unnecessary #includes · 4d573999
由 Heikki Linnakangas 提交于 8月 08, 2017

4d573999
H

Remove unused function. · 9637bb1d
由 Heikki Linnakangas 提交于 8月 08, 2017

9637bb1d

Replace special "QE details" protocol message with standard ParameterStatus msg. · d85257f7

由 Heikki Linnakangas 提交于 8月 08, 2017

This gets rid of the GPDB-specific "QE details" message, that was only sent
once at QE backend startup, to notify the QD about the motion listener port
of the QE backend. Use a standard ParameterStatus message instead, pretending
that there is a GUC called "qe_listener_port". This reduces the difference
between the gp_libpq_fe copy of libpq, and libpq proper. I have a dream that
one day we will start using the standard libpq also for QD-QE communication,
and get rid of the special gp_libpq_fe copy altogether, and this is a small
step in that direction.

In the passing, change the type of Gp_listener_port variable from signed to
unsigned. Gp_listener_port actually holds two values: the TCP and UDP
listener ports, and there is bit-shifting code to store those two 16-bit
port numbers in the single 32-bit integer. But the bit-shifting was a bit
iffy, on a signed integer. Making it unsigned makes it more clear what's
happening.

d85257f7

08 8月, 2017 1 次提交

Remove unnecessary use of PQExpBuffer. · cc38f526

由 Heikki Linnakangas 提交于 8月 08, 2017

StringInfo is more appropriate in backend code. (Unless the buffer needs to
be used in a thread.)

In the passing, rename the 'conn' static variable in cdbfilerepconnclient.c.
It seemed overly generic.

cc38f526