提交 · 8f60737619dcd570e92687e7aa0de381907ed525 · Greenplum / Gpdb

11 8月, 2017 3 次提交
- H
  
  Remove unused fields from interconnect structs. · 8f607376
  由 Heikki Linnakangas 提交于 8月 11, 2017
  
  8f607376
- H
  
  Remove unused debug helper function. · 1fe8f5c2
  由 Heikki Linnakangas 提交于 8月 11, 2017
  
  1fe8f5c2
- H
  
  Remove unused fields from cdbtm's shared memory struct. · 2f803aaf
  由 Heikki Linnakangas 提交于 8月 11, 2017
  
  2f803aaf
09 8月, 2017 5 次提交

Do not include gp-libpq-fe.h and gp-libpq-int.h in cdbconn.h · cf7cddf7

由 Pengzhou Tang 提交于 8月 07, 2017

The whole cdb directory was shipped to end users and all header files
that cdb*.h included are also need to be shipped to make checkinc.py
pass. However, exposing gp_libpq_fe/*.h will confuse customer because
they are almost the same as libpq/*, as Heikki's suggestion, we should
keep gp_libpq_fe/* unchanged. So to make system work, we include
gp-libpg-fe.h and gp-libpq-int.h directly in c files that need them

cf7cddf7

Add debug info for interconnect network timeout · 9a9cd48b

由 Pengzhou Tang 提交于 8月 07, 2017

It was very difficult to verify if interconnect is stucked in resending
phase or if there is udp resending latency within interconnect. To improve
it, this commit record a debug message every Gp_interconnect_debug_retry_interval
times when gp_log_interconnect is set to DEBUG.

9a9cd48b

Move function prototype to where it belongs. · 15ae5084

由 Heikki Linnakangas 提交于 8月 08, 2017

These functions are defined in cdbpartition.c, not tablecmds.c. The
prototypes should be in matching header file.

15ae5084

H

Remove unused function. · 9637bb1d
由 Heikki Linnakangas 提交于 8月 08, 2017

9637bb1d

Replace special "QE details" protocol message with standard ParameterStatus msg. · d85257f7

由 Heikki Linnakangas 提交于 8月 08, 2017

This gets rid of the GPDB-specific "QE details" message, that was only sent
once at QE backend startup, to notify the QD about the motion listener port
of the QE backend. Use a standard ParameterStatus message instead, pretending
that there is a GUC called "qe_listener_port". This reduces the difference
between the gp_libpq_fe copy of libpq, and libpq proper. I have a dream that
one day we will start using the standard libpq also for QD-QE communication,
and get rid of the special gp_libpq_fe copy altogether, and this is a small
step in that direction.

In the passing, change the type of Gp_listener_port variable from signed to
unsigned. Gp_listener_port actually holds two values: the TCP and UDP
listener ports, and there is bit-shifting code to store those two 16-bit
port numbers in the single 32-bit integer. But the bit-shifting was a bit
iffy, on a signed integer. Making it unsigned makes it more clear what's
happening.

d85257f7

08 8月, 2017 1 次提交

Remove unnecessary use of PQExpBuffer. · cc38f526

由 Heikki Linnakangas 提交于 8月 08, 2017

StringInfo is more appropriate in backend code. (Unless the buffer needs to
be used in a thread.)

In the passing, rename the 'conn' static variable in cdbfilerepconnclient.c.
It seemed overly generic.

cc38f526

02 8月, 2017 1 次提交

Make memory spill in resource group take effect · 68babac4

由 Richard Guo 提交于 8月 02, 2017

Resource group memory spill is similar to 'statement_mem' in
resource queue, the difference is memory spill is calculated
according to the memory quota of the resource group.

The related GUCs, variables and functions shared by both resource
queue and resource group are moved to the namespace resource manager.

Also codes of resource queue relating to memory policy are refactored in this commit.
Signed-off-by: NPengzhou Tang <ptang@pivotal.io>
Signed-off-by: NNing Yu <nyu@pivotal.io>

68babac4

31 7月, 2017 1 次提交

Implement "COPY ... FROM ... ON SEGMENT' · e254287e

由 Ming LI 提交于 7月 31, 2017

Support COPY statement that imports the data file on segments directly
parallel. It could be used to import data files generated by "COPY ...
to ... ON SEGMENT'.

This commit also supports all kinds of data file formats which "COPY ...
TO" supports, processes reject limit numbers and logs errors accordingly.

Key workflow:
   a) For COPY FROM, nothing changed by this commit, dispatch modified
   COPY command to segments at first, then read data file on master, and
   dispatch the data to relevant segment to process.

   b) For COPY FROM ON SEGMENT, on QD, read dummy data file, other parts
   keep unchanged, on QE, process the data stream (empty) dispatched
   from QD at first, then re-do the same workflow to read and process
   the local segment data file.
Signed-off-by: NMing LI <mli@pivotal.io>
Signed-off-by: NAdam Lee <ali@pivotal.io>
Signed-off-by: NHaozhou Wang <hawang@pivotal.io>
Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>

e254287e

21 7月, 2017 1 次提交

Improve partition selection logging (#2796) · 038aa959

由 Jesse Zhang 提交于 7月 20, 2017

Partition Selection is the process of determining at runtime ("execution
time") which leaf partitions we can skip scanning. Three types of Scan
operators benefit from partition selection: DynamicTableScan,
DynamicIndexScan, and BitmapTableScan.

Currently, there is a minimal amount of logging about what partitions
are selected, but they are scattered between DynamicIndexScan and
DynamicTableScan (and so we missed BitmapTableScan).

This commit moves the logging into the PartitionSelector operator
itself, when it exhausts its inputs. This also brings the nice side
effect of more granular information: the log now attributes the
partition selection to individual partition selectors.

038aa959

14 7月, 2017 1 次提交

Replay of AO XLOG records · cc9131ba

由 Ashwin Agrawal 提交于 6月 23, 2017

We generate AO XLOG records when --enable-segwalrep is configured. We
should now replay those records on the mirror or during recovery. The
replay is only performed for standby mode since promotion will not
execute until after there are no more XLOG records to read from the
WAL stream.

cc9131ba

13 7月, 2017 2 次提交

Use block number instead of LSN to batch changed blocks in filerep · abe13c79

由 Asim R P 提交于 6月 30, 2017

Filerep resync logic to fetch changed blocks from changetracking (CT)
log is changed. LSN is no longer used to filter out blocks from CT
log. If a relation's changed blocks falls above the threshold number
of blocks that can be fetched at a time, the last fetched block number
is remembered and used to form subsequent batch.

abe13c79

Add GUC to control number of blocks that a resync worker operates on · 2960bd7c

由 Asim R P 提交于 6月 27, 2017

The GUC gp_changetracking_max_rows replaces a compile time constant. Resync
worker obtains at the most gp_changetracking_max_rows number of changed blocks
from changetracking log at one time. Controling this with a GUC allows
exploiting bugs in resync logic around this area.

2960bd7c

11 7月, 2017 2 次提交

Speed up simple queries on AOCS tables, when only a few columns are needed. · 3c40df0a

由 Heikki Linnakangas 提交于 7月 11, 2017

If you have a query like "SELECT COUNT(col1) FROM wide_table", where the
table has dozens of columns, the overhead in aocs_getnext() just to figure
out which columns need to be fetched becomes noticeable. Optimize it.

3c40df0a

Remove unused field. · 781ff5b7

由 Heikki Linnakangas 提交于 7月 11, 2017

This does mean that we don't free the array quite as quickly as we used to,
but it's a drop in the sea. The array is very small, there are much bigger
data structures involved in evey AOCS scan that are not freed as quickly,
and it's freed at the end of the query in any case.

781ff5b7

07 7月, 2017 1 次提交

Remove unused variable in checkpoint record. · f737c2d2

由 Ashwin Agrawal 提交于 7月 05, 2017

segmentCount variable is unused in TMGXACT_CHECKPOINT structure hence loose it
out. Also, removing the union in fspc_agg_state, tspc_agg_state and
dbdir_agg_state structures as don't see reason for having the same.

f737c2d2

29 6月, 2017 1 次提交

Implement resgroup memory limit (#2669) · b5e1fb0a

由 Ning Yu 提交于 6月 29, 2017

Implement resgroup memory limit.

In a resgroup we divide the memory into several slots, the number
depends on the concurrency setting in the resgroup. Each slot has a
reserved quota of memory, all the slots also share some shared memory
which can be acquired preemptively.

Some GUCs and resgroup options are defined to adjust the exact allocation
policy:

resgroup options:
- memory_shared_quota
- memory_spill_ratio

GUCs:
- gp_resource_group_memory_limit
Signed-off-by: NNing Yu <nyu@pivotal.io>

b5e1fb0a

28 6月, 2017 1 次提交

Destory dangling Gang if interrupted during creation. (#2696) · 90f59f88

由 Kenan Yao 提交于 6月 28, 2017

If QD receives a SIGINT and calls CHECK_FOR_INTERRUPTS after finishing Gang creation, but before recording this Gang in global variables like primaryWriterGang, this Gang would not be destroyed, hence next time QD wants to create a new writer Gang, it would find existing writer Gang on segments, and report snapshot collision error.

90f59f88

27 6月, 2017 1 次提交
- A
  
  The cdbCopyGetDbCount() is only defined here (#2706) · 464ecf37
  由 Andreas Scherbaum 提交于 6月 27, 2017
  
  464ecf37
13 6月, 2017 1 次提交

Checksum protect filerep change tracking files. · a0cc23d4

由 Ashwin Agrawal 提交于 4月 03, 2015

Change tracking files are used to capture information on what's changed while
mirror was down, to help incrementally bring it back to sync. In some instances
mostly due to disk issues / full situations, if change tracking log was partial
written or got messed up, resulted in rolling PANIC of segment and there by DB
unavailable due to double fault. Only way out was manual intervention to remove
changetracking files and run full resync.

So, instead this commit now adds checksum protection to auto detect any problem
with change tracking files during recovery / incremental resync. If checksum
miss-match gets detected it takes preventive action to mark segment into
ChangeTrackingDisabled state and keeps DB available. Plus, explicitely enforces
only full recovery is allowed to bring mirror up in sync as changetracking info
doesn't exist. Any attenpt for incremental resync clearly communicates out full
resync has to be performed. So, this eliminates need for manual intervention to
get DB back to availble state if changetracking files get corrupted.

a0cc23d4

09 6月, 2017 1 次提交
- P
  Add statistic for number of active Motion connections · dceeb3dc
  由 Pengzhou Tang 提交于 6月 07, 2017
```
This statistic info is convenient for debugging and test cases
```
  dceeb3dc
07 6月, 2017 2 次提交

Fix an unexpectd cursor error under TCP interconnect · 09aca5fa

由 Pengzhou Tang 提交于 6月 02, 2017

Under former TCP interconnect, after declaring a cursor for an invalid
query like "declare c1 cursor for select c1/0 from foo", the following FETCH
command can still fetch an empty row instead of an error. This is
incorrect and does not consist with UDP interconnect.

The RCA is senders of TCP interconnect always send an EOF message to
their peers regardless of the errors on segments, so the receivers can
not tell the difference of EOF or an error.

The solution in this commit is do not send the EOF to the peers if senders is
encountering an error and let the QD to check the whole segments status when it
can not read data from interconnect for a long time.

09aca5fa

restore TCP interconnect · 353a937d

由 Pengzhou Tang 提交于 5月 22, 2017

This commit restore TCP interconnect and fix some hang issues.

* restore TCP interconnect code
* Add GUC called gp_interconnect_tcp_listener_backlog for tcp to control the backlog param of listen call
* use memmove instead of memcpy because the memory areas do overlap.
* call checkForCancelFromQD() for TCP interconnect if there are no data for a while, this can avoid QD from getting stuck.
* revert cancelUnfinished related modification in 8d251945, otherwise some queries will get stuck
* move and rename faultinjector "cursor_qe_reader_after_snapshot" to make test cases pass under TCP interconnect.

353a937d

06 6月, 2017 1 次提交

Generate Xlog for AO/CO operations. · 42a45941

由 Shoaib Lari 提交于 5月 25, 2017

Define a xlog format for AO operations. The xlog is generated when an AO block
is written.

A test is added to receive the AO log from the WAL Sender process after an
INSERT operation and verify that the the received xlog has the AO xlog record in
it.
Signed-off-by: NAbhijit Subramanya <asubramanya@pivotal.io>

42a45941

01 6月, 2017 2 次提交

Fixup subplans referring to same plan_id · d0aea184

由 Bhuvnesh Chaudhary 提交于 5月 24, 2017

Before parallelization on nodes in cdbparallelize if there are
any subplan nodes in the plan which refer to the same plan_id,
parallelization step breaks as a node must be processed only
once by it. This patch fixes the issue by generating a new
subplan node in glob subplans, and updating the plan_id of the
subplan to refer to the newly created node.

d0aea184

Optimize DistributedSnapshot check and refactor to simplify. · 3c21b7d8

由 Ashwin Agrawal 提交于 5月 24, 2017

Before this commit, snapshot stored information of distributed in-progress
transactions populated during snapshot creation and its corresponding localXids
found during tuple visibility check later (used as cache) by reverse mapping
using single tightly coupled data structure DistributedSnapshotMapEntry. Storing
the information this way possed couple of problems:

1] Only one localXid can be cached for a distributedXid. For sub-transactions
same distribXid can be associated with multiple localXid, but since can cache
only one, for other local xids associated with distributedXid need to consult
the distributed_log.

2] While performing tuple visibility check, code must loop over full size of
distributed in-progress array always first to check if cached localXid can be
utilized to avoid reverse mapping.

Now, decoupled the distributed in-progress with local xids cache separately. So,
this allows us to store multiple xids per distributedXid. Also, allows to
optimize scanning localXid only if tuple xid is relevant to it and also scanning
size only equivalent to number of elements cached instead of size of distributed
in-progress always even if nothing was cached.

Along the way, refactored relevant code a bit as well to simplify further.

3c21b7d8

15 5月, 2017 1 次提交

Make a copy of Gp_interconnect_snd_queue_depth for each interconnect setup · 7ed4a178

由 Pengzhou Tang 提交于 5月 08, 2017

Formally, GPDB assumed Gp_interconnect_queue_depth was constant during the interconnect life-time which was incorrect for Cursors, if Gp_interconnect_queue_depth was changed after a Cursor was declared, a panic occurred. To avoid this, we make a copy of Gp_interconnect_queue_depth when interconnect is set up. Gp_interconnect_snd_queue_depth has no such problem because it is only used by senders and senders of Cursor will never receive the GUC change command.

7ed4a178

11 5月, 2017 1 次提交

Remove unused subplans · 574d352a

由 Haisheng Yuan 提交于 5月 07, 2017

As of commit c7ff7663, subplans are initialized even they are unused. In some
case, the generated subplan is not used, but it has motionId = 0, which breaks
executor's sanity check. This patch fixes the issue by removing unused
subplans.

See discussions at:
https://github.com/greenplum-db/gpdb/issues/2383

574d352a

04 5月, 2017 2 次提交
- A
  Fixing compiler warning for uninitialized desc. · bc973d40
  由 Ashwin Agrawal 提交于 4月 24, 2017
```
Purely making change to have compiler happy, zero impact as
FileRepOperationDescription_u is unused for those OperationTypes.
```
  bc973d40
- L
  gpperfmon: remove filerep stat collection · 93638f09
  由 Larry Hamel 提交于 5月 03, 2017
```
Signed-off-by: NMarbin Tan <mtan@pivotal.io>
```
  93638f09
28 4月, 2017 2 次提交

HeapTupleSatisfiesVacuum consider also distributedXmin. · 0d7839d7

由 Ashwin Agrawal 提交于 4月 19, 2017

Vacuum now uses distributed lowest dxid to decide oldest transaction *globally*
running in cluster to make sure tuple is DEAD globally before removing the
same. HeapTupleSatisfiesVacuum() consults distributed snapshot by reverse
mapping localXid to distributed xid to check xminAllDistributedSnapshots and
verifies its not needed anymore globally. Note the check is conservative from
perpective if cannot check against distributed snapshot (like utility mode
vacuum) will try to keep the tuple than prematurely getting rid of it and
suffering the same problem.

This fixes the problem of not removing the tuple still needed. Earlier it
performed the check just based on local information (oldestXmin) on a segment,
and hence may cleanup a tuple visible to a distributed query yet to reach the
segment, which breaks snapshot isolation.

Fixes #801.

0d7839d7

Correct calculation of xminAllDistributedSnapshots and set it on QE's. · d887fe0c

由 Ashwin Agrawal 提交于 4月 11, 2017

For vacuum, page pruning and freezing to perform its job correctly on QE's, it
needs to know globally what's the lowest dxid till any transaction can see in
full cluster. Hence QD must calculate and send that info to QE. For this purpose
using logic similar to one for calculating globalxmin by local snapshot. TMGXACT
for global transactions serves similar to PROC and hence its leveraged to
provide us lowest gxid for its snapshot. Further using its array, shmGxactArray,
can easily find the lowest across all global snapshots and pass down to QE via
snapshot.

Adding unit test for createDtxSnapshot along with the change.

d887fe0c

21 4月, 2017 1 次提交
- A
  
  Remove dead isXmax argument to Snapshot functions. · 8f4482f9
  由 Ashwin Agrawal 提交于 4月 10, 2017
  
  8f4482f9
12 4月, 2017 1 次提交

Fix duplicate typedefs · caa511b4

由 Jesse Zhang 提交于 4月 11, 2017

Similar to 615b4c69, this removes a duplicate (and dead) typedef. This
unblocks compilation on older versions of GCC.

caa511b4

11 4月, 2017 2 次提交

Remap transient typmods on receivers instead of on senders. · d8ac3308

由 Pengzhou Tang 提交于 3月 17, 2017

QD used to send a transient types table to QEs, then QE would remap the
tuples with this table before sending them to QD. However in complex
queries QD can't discover all the transient types so tuples can't be
correctly remapped on QEs. One example is like below:

    SELECT q FROM (SELECT MAX(f1) FROM int4_tbl
                   GROUP BY f1 ORDER BY f1) q;
    ERROR:  record type has not been registered

To fix this issue we changed the underlying logic: instead of sending
the possibly incomplete transient types table from QD to QEs, we now
send the tables from motion senders to motion receivers and do the remap
on receivers. Receivers maintain a remap table for each motion so tuples
from different senders can be remapped accordingly. In such way, queries
contain multi-slices can also handle transient record type correctly
between two QEs.

The remap logic is derived from the executor/tqueue.c in upstream
postgres. There is support for composite/record types and arrays as well
as range types, however as range types are not yet supported in GPDB so
the logic is put under a conditional compilation macro, in theory it
shall be automatically enabled when range types are supported in GPDB.

One side effect for this approach is that on receivers a performance
down is introduced as the remap requires recursive checks on each tuple
of record types. However optimization is made to make this side effect
minimum on non-record types.

Old logic that building transient types table on QD and sending them to
QEs are retired.
Signed-off-by: NGang Xiong <gxiong@pivotal.io>
Signed-off-by: NNing Yu <nyu@pivotal.io>

d8ac3308

Fix and refactor blockdirectory update for AO and CO. · fe1b7616

由 Ashwin Agrawal 提交于 4月 01, 2017

Alter table add column for CO table completely missed updating block-directory
if default value for column is greater than blockSize. In this case one large
content block will be created followed by small content blocks containing the
actual column value. Missing to update block-directory generates wrong result
during index scans after such alter. The commit fixes the issue by updating the
block-directory for such a case accompanied with test to validate the same.

Also, while fixing the same refactor code
- rename lastWriteBeginPosition to logicalBlockStartOffset for better clarity
based on its usage
- centralize block-directory insert in datumstream block read-write routine
- remove redundant buildBlockDirectory flag

fe1b7616

07 4月, 2017 2 次提交

K
Define a GUC 'max_resource_groups' to constraint resource groups created in · 3dc4703f
由 Kenan Yao 提交于 3月 29, 2017
```
database.

Signed-off-by Richard Guo <riguo@pivotal.io>
Signed-off-by Gang Xiong <gxiong@pivotal.io>
```
3dc4703f

Implement concurrency limit of resource group. · d0c6a352

由 Kenan Yao 提交于 3月 29, 2017

Works include:
* define structures used by resource group in shared memory;
* insert/remove shared memory object when Create/Drop Resource Group;
* clean up and restore when Create/Drop Resource Group fails;
* implement concurrency slot acquire/release functionality;
* sleep when concurrency slot is not available, and wake up others when
releasing a concurrency slot if necessary;
* handle signals in resource group properly;

Signed-off-by Richard Guo <riguo@pivotal.io>
Signed-off-by Gang Xiong <gxiong@pivotal.io>

d0c6a352