提交 · aa5798a930512feda63ac0db645ed7ac73088d5f · Greenplum / Gpdb

03 2月, 2018 1 次提交

Vacuum fix for ERROR updated tuple is already HEAP_MOVED_OFF. · aa5798a9

由 Ashwin Agrawal 提交于 1月 23, 2018

`repair_frag()` should consult distributed snapshot
(`localXidSatisfiesAnyDistributedSnapshot()`) while following and moving chains
of updated tuples. Vacuum consults distributed snapshot
(`localXidSatisfiesAnyDistributedSnapshot()`) to find which tuples can be
deleted and not. For RECENTLY_DEAD tuples it used to make decision just based on
comparison with OldestXmin which is not sufficient and even there distributed
snapshot must be checked.

Fixes #4298

(cherry picked from commit 313ab24f)

aa5798a9

02 2月, 2018 2 次提交

A
Revert "Vacuum fix for ERROR updated tuple is already HEAP_MOVED_OFF." · f71748df
由 Ashwin Agrawal 提交于 2月 01, 2018
```
This reverts commit 508ffd48.
```
f71748df

Vacuum fix for ERROR updated tuple is already HEAP_MOVED_OFF. · 508ffd48

由 Ashwin Agrawal 提交于 1月 23, 2018

`repair_frag()` should consult distributed snapshot
(`localXidSatisfiesAnyDistributedSnapshot()`) while following and moving chains
of updated tuples. Vacuum consults distributed snapshot
(`localXidSatisfiesAnyDistributedSnapshot()`) to find which tuples can be
deleted and not. For RECENTLY_DEAD tuples it used to make decision just based on
comparison with OldestXmin which is not sufficient and even there distributed
snapshot must be checked.

Fixes #4298

(cherry picked from commit 313ab24f)

508ffd48

31 1月, 2018 1 次提交

Fix dispatching of queries with record-type parameters. · f24b9ab5

由 Heikki Linnakangas 提交于 1月 30, 2018

This fixes the "ERROR: record type has not been registered" error, when
a record-type variable is used in a query inside a PL/pgSQL function.
This is essentially the same problem we battled with in Motion nodes
in GPDB 5, and added the whole tuple remapper to deal with it. Only this
time, the problem is with record Datums being dispatched from QD to QE,
as Params, rather than with record Datums being transferred across a
Motion.

To fix, send the transient record type cache along with the query
parameters, if there are any of the parameters are transient record types.
This is a bit inefficient, as the transient record type cache can be quite
large. A more fine-grained approach would be to send only those record
types that are actually used in the parameters, but more code would be
required to figure that out. This will do for now.

Refactor the serialization and deserialization of the query parameters, to
leverage the outfast/readfast functions.

Backport to 5X_STABLE. This changes the wire format of query parameters, so
this requires the QD and QE to be on the same minor version. But this does
not change the on-disk format, or the numbering of existing Node tags.

Fixes github issue #4444.

f24b9ab5

30 1月, 2018 1 次提交

Alloc Instrumentation in Shmem · 9a0954e4

由 Wang Hao 提交于 10月 20, 2017

On postmaster start, additional space in Shmem is allocated for Instrumentation
slots and a header. The number of slots is controlled by a cluster level GUC,
default is 5MB (approximate 30K slots). The default number is estimated by 250
concurrent queries * 120 nodes per query. If the slots are exhausted,
instruments are allocated in local memory as fallback.

These slots are organized as a free list:
  - Header points to the first free slot.
  - Each free slot points to next free slot.
  - The last free slot's next pointer is NULL.

ExecInitNode calls GpInstrAlloc to pick an empty slot from the free list:
  - The free slot pointed by the header is picked.
  - The picked slot's next pointer is assigned to the header.
  - A spin lock on the header to prevent concurrent writing.
  - When GUC gp_enable_query_metrics is off, Instrumentation will
    be allocated in local memory.

Slots are recycled by resource owner callback function.

Benchmark result with TPC-DS shows performance impact by this commit is less than 0.1%
To improve performance of instrumenting, following optimizations are added:
  - Introduce instrument_option to skip CDB info collection
  - Optimize tuplecount in Instrumentation from double to uint64
  - Replace instrument tuple entry/exit function with macro
  - Add need_timer to Instrumentation, to allow eliminating of timing overhead.
    This is porting part of upstream commit:
------------------------------------------------------------------------
commit af7914c6
Author: Robert Haas <rhaas@postgresql.org>
Date:   Tue Feb 7 11:23:04 2012 -0500

Add TIMING option to EXPLAIN, to allow eliminating of timing overhead.
------------------------------------------------------------------------

Author: Wang Hao <haowang@pivotal.io>
Author: Zhang Teng <tezhang@pivotal.io>

9a0954e4

22 1月, 2018 2 次提交

Fix TCP related ic test case. · 95459cc1

由 Richard Guo 提交于 1月 22, 2018

If interconnect_context cannot be allocated in
SetupTCPInterconnect, TeardownTCPInterconnect
would suffer from SIGSEGV. This PR fixes it.

95459cc1

Fix updating rx_buffer_pool when tear down connection. · 88318787

由 Richard Guo 提交于 1月 22, 2018

Do not update rx_buffer_pool.maxCount when tear down
connection if conn->pkt_q is failed to be assigned a
valid address in set up connection.

88318787

18 1月, 2018 1 次提交

Set tupledesc in remapper only when remapping is required. · 7361bdaf

由 Richard Guo 提交于 1月 18, 2018

When build remap info for fields of type described by given
tupledesc, set tupledesc in remapper only when remapping
is required.

7361bdaf

28 12月, 2017 1 次提交

Able to cancel COPY PROGRAM ON SEGMENT if the program hangs · ecd44052

由 Adam Lee 提交于 12月 14, 2017

There are two places that QD keep trying to get data, ignore SIGINT, and
not send signal to QEs. If the program on segment has no input/output,
copy command hangs.

To fix it, this commit:

1, lets QD wait connections able to be read before PQgetResult(), and
cancels queries if gets interrupt signals while waiting
2, sets DF_CANCEL_ON_ERROR when dispatch in cdbcopy.c
3, completes copy error handling

-- prepare
create table test(t text);
copy test from program 'yes|head -n 655360';

-- could be canceled
copy test from program 'sleep 100 && yes test';
copy test from program 'sleep 100 && yes test<SEGID>' on segment;
copy test from program 'yes test';
copy test to '/dev/null';
copy test to program 'sleep 100 && yes test';
copy test to program 'sleep 100 && yes test<SEGID>' on segment;

-- should fail
copy test from program 'yes test<SEGID>' on segment;
copy test to program 'sleep 0.1 && cat > /dev/nulls';
copy test to program 'sleep 0.1<SEGID> && cat > /dev/nulls' on segment;

(cherry picked from commit 25c70407dc038a2c56ccb37a3540c9af6a99e6e4)

ecd44052

07 12月, 2017 1 次提交

Resent a cancel/finish signal if QE didn't respond for a long time. · 58492956

由 Pengzhou Tang 提交于 12月 01, 2017

Previously, dispatcher only send cancel/finish signal to QEs once, so if
the signal arrives faster than the query or is omitted by the secure_read(),
the QE may have no chance to quit if the QE is assigned to execute a MOTION
node and it's peer has been canceled.

This fixes issue #3950

58492956

09 11月, 2017 1 次提交
- P
  Write a message if it fails to flush the outbound data to peer. · 12c64c30
  由 Pengzhou Tang 提交于 11月 08, 2017
```
(partial work picked from eaae4a5b)
```
  12c64c30
08 11月, 2017 1 次提交

Do a force flush before checking the result of a connection · 4661b620

由 Pengzhou Tang 提交于 11月 05, 2017

Previously, to speed up dispatching, cdbdisp_dispatchToGang_async
and cdbdisp_waitDispatchFinish_async are designed to use nonblock
flush to dispatch commands in bulk, however, risks exist that some
commands are not fully dispatched in corner error cases, so QD must
do a force flush before handling such connections, otherwise QD will
get stuck.

4661b620

06 11月, 2017 1 次提交

Symlink libpq files for backend and optimize the makefiles · 5bc459c1

由 Adam Lee 提交于 11月 01, 2017

src/backend's makefiles have its own rules, this commit symlinks libpq
files for backend to leverage them, canonical and much simpler.

What are the rules?

1, src/backend compile SUBDIR, list OBJS in sub-directories'
objfiles.txt, then link them all into postgres.

2, mock.mk links all OBJS, but filters out the objects which mocked by
cases.

(cherry picked from commit 1e9cd7d9)

5bc459c1

02 11月, 2017 1 次提交

Don't pass around MemTuples as HeapTuples. · 06e7a09f

由 Heikki Linnakangas 提交于 11月 01, 2017

Invent a new pointer type, GenericTuple, for when we might be dealing with
either a MemTuple or a HeapTuple. The old practice of holding a MemTuple
in a HeapTuple-typed variable, or passing a MemTuple to a function that's
declared to take a HeapTuple parameter, seemed dangerous.

06e7a09f

30 10月, 2017 2 次提交

A
Retire gp_libpq_fe part 2, changing including path · 6f650ed5
由 Adam Lee 提交于 10月 23, 2017
```
Signed-off-by: NAdam Lee <ali@pivotal.io>
```
6f650ed5

Retire gp_libpq_fe part 1, libpq itself · 8f9fcd73

由 Adam Lee 提交于 10月 23, 2017

    commit b0328d5631088cca5f80acc8dd85b859f062ebb0
    Author: mcdevc <a@b>
    Date:   Fri Mar 6 16:28:45 2009 -0800

        Separate our internal libpq front end from the client libpq library
        upgrade libpq to the latest to pick up bug fixes and support for more
        client authentication types (GSSAPI, KRB5, etc)
        Upgrade all files dependent on libpq to handle new version.

Above is the initial commit of gp_libpq_fe, seems no good reasons still
having it.

Key things this PR do:

1, remove the gp_libpq_fe directory.
2, build libpq source codes into two versions, for frontend and backend,
check the macro FRONTEND.
3, libpq for backend still bypasses local authentication, SSL and some
environment variables, and these are the whole differences.

(back ported from 510a20b6, with some
fixes for SUSE and Windows)
Signed-off-by: NAdam Lee <ali@pivotal.io>

8f9fcd73

27 10月, 2017 1 次提交

Fix has_external_partition() incompatible pointer type warnings · 15acffab

由 Adam Lee 提交于 10月 27, 2017

cdbpartition.c: In function ‘rel_has_external_partition’:
cdbpartition.c:489:32: warning: passing argument 1 of ‘has_external_partition’ from incompatible pointer type [-Wincompatible-pointer-types]
  return has_external_partition(n->rules);
                                ^
cdbpartition.c:106:13: note: expected ‘PartitionRule * {aka struct PartitionRule *}’ but argument is of type ‘List * {aka struct List *}’
 static bool has_external_partition(PartitionRule *rules);
             ^~~~~~~~~~~~~~~~~~~~~~

15acffab

26 10月, 2017 1 次提交

Support exchange sub partition with external table · 5a415e5e

由 Xiaoran Wang 提交于 10月 26, 2017

For example:

    alter tableA alter partition partiton1
    exchange partition partition1_subpartition1 with table external_table;

partition1 is the first level partition of tableA.
partition1_subpartition1 is a sub partition of partition1.

1) The routine ATPExecPartExchange in tablecmds.c searches
the target partition's table through parameters following
the key 'alter'. Then no matter what level the partition is, the
routine exchanges partition with the new table in the same way.
So exchanging sub partition with external table works well now.

2) Queries against partitioned tables that are altered to
use an external table as a leaf child partition fall back
to the legacy query optimizer. By calling routine
rel_has_external_partition in cdbpartition.c, GPORCA knows
if the table has an external partition. However, the routine
just searchs the first level partition child and it returns
false when the sub partition is an external table. Fix it in this
commit though searching every partition in table.
Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>

5a415e5e

10 10月, 2017 1 次提交

Fix multistage aggregation final target list · 219f226e

由 Bhuvnesh Chaudhary 提交于 10月 05, 2017

If a target list entry is found under a relabelnode, the newly created
var node should be nested inside the relabelnode if the vartype
of the var node is different than the resulttype of the relablenode.
Otherwise, the cast information is lost and executor will complain type mismatch
	```sql
	CREATE TABLE t1 (a varchar, b character varying) DISTRIBUTED RANDOMLY;
	SELECT array_agg(f)  FROM (SELECT b::text as f FROM t1 GROUP BY b) q;
	ERROR:  attribute 1 has wrong type (execQual.c:763)  (seg0 slice2 127.0.0.1:25432 pid=7064)
	DETAIL:  Table has type character varying, but query expects text.
	```

219f226e

09 10月, 2017 1 次提交

Decouple GUC max_resource_groups and max_connections. · 2fe7c8d2

由 Richard Guo 提交于 10月 09, 2017

Previously there is a restriction on GUC 'max_resource_groups'
that it cannot be larger than 'max_connections'.
This restriction may cause gpdb fail to start if the two GUCs
are not set properly.
We decide to decouple these two GUCs and set a hard limit
of 100 for 'max_resource_groups'.

2fe7c8d2

26 9月, 2017 1 次提交

Surface ORCA Memory Account info in EXPLAIN (#3354) (#3381) · 61df0283

由 sambitesh 提交于 9月 25, 2017

This commit will display the contents of the Optimizer Mem Account when
the optimizer GUC is on and explain_memory_verbosity is set to
'summary'.
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>

61df0283

25 9月, 2017 1 次提交

Report COPY PROGRAM's error output · 8e23d64e

由 Adam Lee 提交于 9月 14, 2017

Replace popen() with popen_with_stderr() which is used in external web
table also to collect the stderr output of program.

Since popen_with_stderr() forks a `sh` process, it's almost always
sucessful, this commit catches errors happen in fwrite().

Also passes variables as the same as what external web table does.
Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
(cherry picked from commit 2b51c16b)

8e23d64e

21 9月, 2017 1 次提交

Fix multistage aggregation plan targetlists · ad166563

由 Bhuvnesh Chaudhary 提交于 9月 19, 2017

If there are aggregation queries with aliases same as the table actual
columns and they are propagated further from subqueries and grouping is
applied on the column alias it may result in inconsistent targetlists
for aggregation plan causing crash.

	CREATE TABLE t1 (a int) DISTRIBUTED RANDOMLY;
	SELECT substr(a, 2) as a
	FROM
		(SELECT ('-'||a)::varchar as a
			FROM (SELECT a FROM t1) t2
		) t3
	GROUP BY a;

ad166563

20 9月, 2017 1 次提交

Cherry-pick psprintf() function from upstream, and use it. · 12c7f256

由 Heikki Linnakangas 提交于 8月 15, 2017

This makes constructing strings a lot simpler, and less scary. I changed
many places in GPDB code to use the new psprintf() function, where it
seemed to make most sense. A lot of code remains that could use it, but
there's no urgency.

I avoided changing upstream code to use it yet, even where it would make
sense, to avoid introducing unnecessary merge conflict.

The biggest changes are in cdbbackup.c, where the code to count the buffer
sizes was most really complex. I also refactored the #ifdef USE_DDBOOST
blocks so that there is less repetition between the USE_DDBOOST and
!USE_DDBOOST blocks, that should make it easier to catch bugs at compilation
time, that affect the !USE_DDBOOST case, when compiling with USE_DDBOOST,
and vice versa. I also switched to using pstrdup instead of strdup() in
a few places, to avoid memory leaks. (Although the way cdbbackup works,
it would only get launched once per connection, so it didn't really matter
in practice.)

12c7f256

14 9月, 2017 1 次提交
- R
  Verify the newval for GUC 'statement_mem' and 'max_resource_groups' only if... · 7731b964
  由 Richard Guo 提交于 9月 14, 2017
```
Verify the newval for GUC 'statement_mem' and 'max_resource_groups' only if they are actually being set.
```
  7731b964
12 9月, 2017 2 次提交

Fix wrong results for NOT-EXISTS sublinks with aggs & LIMIT · d8c7b947

由 Shreedhar Hardikar 提交于 9月 08, 2017

During NOT EXISTS sublink pullup, we create a one-time false filter when
the sublink contains aggregates without checking for limitcount. However
in situations where the sublink contains an aggregate with limit 0, we
should not generate such filter as it produces incorrect results.

Added regress test.

Also, initialize all the members of IncrementVarSublevelsUp_context
properly.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

d8c7b947

Refactor adding explicit distribution motion logic · 75339b9f

由 Bhuvnesh Chaudhary 提交于 9月 10, 2017

nMotionNodes tracks the number of Motion in a plan, and each
plan node maintains nMotionNodes. Counting number of Motions in a plan node by
traversing the tree and adding up nMotionNodes found in nested plans will give
incorrect number of Motion nodes. So instead of using nMotionNodes, use
a boolean flag to track if the subtree tree excluding the initplans
contains a motion node

75339b9f

08 9月, 2017 1 次提交

Handle the failure in AssignResGroupOnMaster() · 87ddb6b8

由 xiong-gang 提交于 9月 04, 2017

As AssignResGroupOnMaster() is called before the transaction is
actually started, so the failure won't cause transaction abort,
we need handle the error to prevent slot leaking.
Signed-off-by: NZhenghua Lyu <zlv@pivotal.io>

87ddb6b8

07 9月, 2017 2 次提交

Force a stand-alone backend to run in utility mode. · abedbc23

由 Heikki Linnakangas 提交于 9月 06, 2017

In a stand-alone backend ("postgres --single"), you cannot realistically
expect any of the infrastructure needed for MPP processing to be present.
Let's force a stand-alone backend to run in utility mode, to make sure
that we don't try to dispatch queries, participate in distributed
transactions, or anything like that, in a stand-alone backend.

Fixes github issue #3172, which was one such case where we tried to
dispatch a SET command in single-user mode, and got all confused.

abedbc23

Bring in recursive CTE to GPDB · 546fa1f6

由 Haisheng Yuan 提交于 6月 13, 2017

Planner generates plan that doesn't insert any motion between WorkTableScan and
its corresponding RecursiveUnion, because currently in GPDB motions are not
rescannable. For example, a MPP plan for recursive CTE query may look like:
```
Gather Motion 3:1
   ->  Recursive Union
         ->  Seq Scan on department
               Filter: name = 'A'::text
         ->  Nested Loop
               Join Filter: d.parent_department = sd.id
               ->  WorkTable Scan on subdepartment sd
               ->  Materialize
                     ->  Broadcast Motion 3:3
                           ->  Seq Scan on department d
```

For the current solution, the WorkTableScan is always put on the outer side of
the top most Join (the recursive part of RecusiveUnion), so that we can safely
rescan the inner child of join without worrying about the materialization of a
potential underlying motion. This is a heuristic based plan, not a cost based
plan.

Ideally, the WorkTableScan can be placed on either side of the join with any
depth, and the plan should be chosen based on the cost of the recursive plan
and the number of recursions. But we will leave it for later work.

Note: The hash join is temporarily disabled for plan generation of recursive
part, because if the hash table spills, the batch file is going to be removed
as it executes. We have a following story to enable spilled hash table to be
rescannable.

See discussion at gpdb-dev mailing list:
https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/s_SoXKlwd6I

546fa1f6

06 9月, 2017 1 次提交

Fix reuse of cached plans in user-defined functions. · 9fc02221

由 Heikki Linnakangas 提交于 9月 06, 2017

CdbDispatchPlan() was making a copy of the plan tree, in the same memory
context as the old plan tree was in. If the plan came from the plan cache,
the copy will also be stored in the CachedPlan context. That means that
every execution of the cached plan will leak a copy of the plan tree in
the long-lived memory context.

Commit 8b693868 fixed this for cached plans being used directly with
the extended query protocol, but it did not fix the same issue with plans
being cached as part of a user-defined function. To fix this properly,
revert the changes to exec_bind_message, and instead in CdbDispatchPlan,
make the copy of the plan tree in a short-lived memory context.

Aside from the memory leak, it was never a good idea to change the original
PlannedStmt's planTree pointer to point to the modified copy of the plan
tree. That copy has had all the parameters replaced with their current
values, but on the next execution, we should do that replacement again. I
think that happened to not be an issue, because we had code elsewhere that
forced re-planning of all queries anyway. Or maybe it was in fact broken.
But in any case, stop scribbling on the original PlannedStmt, which might
live in the plan cache, and make a temporary copy that we can freely
scribble on in CdbDispatchPlan, that's only used for the dispatch.

9fc02221

05 9月, 2017 1 次提交

Simplify tuple serialization in Motion nodes. · 11e4aa66

由 Ning Yu 提交于 9月 05, 2017

* Simplify tuple serialization in Motion nodes.

There is a fast-path for tuples that contain no toasted attributes,
which writes the raw tuple almost as is. However, the slow path is
significantly more complicated, calling each attribute's binary
send/receive functions (although there's a fast-path for a few
built-in datatypes). I don't see any need for calling I/O functions
here. We can just write the raw Datum on the wire. If that works
for tuples with no toasted attributes, it should work for all tuples,
if we just detoast any toasted attributes first.

This makes the code a lot simpler, and also fixes a bug with data
types that don't have a binary send/receive routines. We used to
call the regular (text) I/O functions in that case, but didn't handle
the resulting cstring correctly.

Diagnosis and test case by Foyzur Rahman.
Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>
Signed-off-by: NNing Yu <nyu@pivotal.io>

11e4aa66

01 9月, 2017 3 次提交

Fix Copyright and file headers across the tree · ed7414ee

由 Daniel Gustafsson 提交于 9月 01, 2017

This bumps the copyright years to the appropriate years after not
having been updated for some time. Also reformats existing code
headers to match the upstream style to ensure consistency.

ed7414ee

Set errcode on AO checksum errors · 60f9ac3d

由 Daniel Gustafsson 提交于 9月 01, 2017

The missing errcode makes the ereport call include the line number
of the invocation from the .c file, which not only isn't too useful
but cause the tests to fail on adding/removing code from the file.

60f9ac3d

Always read returnvalue from stat calls · 742e5415

由 Daniel Gustafsson 提交于 8月 31, 2017

{f}stat() can fail and reading the stat buffer without checking for
status is bad hygiene. Ensure to always test return value and take
the appropriate error path in case of stat error.

742e5415

30 8月, 2017 2 次提交

H
Remove misc unused code. · 37d2a5b3
由 Heikki Linnakangas 提交于 8月 30, 2017
```
'nuff said.
```
37d2a5b3

Eliminate '#include "utils/resowner.h"' from lock.h · 6b25c0a8

由 Heikki Linnakangas 提交于 8月 29, 2017

It was getting in the way of backporting commit 9b1b9446f5 from PostgreSQL,
which added an '#include "storage/lock.h"' to resowner.h, forming a cycle.

The include was only needed for the decalaration of awaitedOwner global
variable. Replace "ResourceOwner" with the equivalent "struct
ResourceOwnerData *" to avoid it.

This revealed a bunch of other files that were relying on resowner.h
being indirectly included through lock.h. Include resowner.h directly
in those files.

The ResPortalIncrement.owner field was not used for anything, so instead
of including resowner.h in that file, just remove the field that needed
it.

6b25c0a8

29 8月, 2017 1 次提交

Perform resource group operations only when it's initialized · 939208b5

由 Pengzhou Tang 提交于 8月 23, 2017

The resource group is enabled but not initialized on auxiliary processes
and special backends like ftsprobe and filerep, previously we performed
resource group operations no matter resource group is initialized or not
which leads to some unexpected error.

939208b5

28 8月, 2017 2 次提交

Avoid side effects in assertions · 288dde95

由 Daniel Gustafsson 提交于 8月 28, 2017

An assertion with a side effect may alter the main codepath when
the tree is built with --enable-cassert, which in turn may lead
to subtle differences due compiler optimizations and/or straight
bugs in the side effect. Rewrite the assertions without side
effects to leave the main codepath intact.

288dde95

Optimize `COPY TO ON SEGMENT` result processing · 266355d3

由 Adam Lee 提交于 8月 25, 2017

Don't send nonsense '\n' characters just for counting, let segments
report how many rows are processed instead.
Signed-off-by: NMing LI <mli@apache.org>

266355d3