提交 · 4b3609423ae8e1d269bf0b02238fd05af7ca495d · Greenplum / Gpdb

13 6月, 2016 2 次提交

Dispatch exactly same text string for all slices. · 4b360942

由 Kenan Yao 提交于 6月 06, 2016

Include a map from sliceIndex to gang_id in the dispatched string,
and remove the localSlice field, hence QE should get the localSlice
from the map now. By this way, we avoid duplicating and modifying
the dispatch text string slice by slice, and each QE of a sliced
dispatch would get same contents now.

The extra space cost is sizeof(int) * SliceNumber bytes, and the extra
computing cost is iterating the SliceNumber-size array. Compared with
memcpy of text string for each slice in previous implementation, this
way is much cheaper, because SliceNumber is much smaller than the size
of dispatch text string. Also, since SliceNumber is so small, we just
use an array for the map instead of a hash table.

Also, clean up some dead code in dispatcher, including:
(1) Remove primary_gang_id field of Slice struct and DispatchCommandDtxProtocolParms
struct, since dispatch agent is deprecated now;
(2) Remove redundant logic in cdbdisp_dispatchX;
(3) Clean up buildGpDtxProtocolCommand;

4b360942

Fix bugs when a SET command is executed within init plan · 2cda812c

由 Pengzhou Tang 提交于 5月 24, 2016

In commit d2725929, GPDB marked all allocatedReaderGangs with noReuse flag. When plan contains
init plan and a SET command executed within it, GPDB will mark pre-assigned gangs to noReuse and
destroy them which make query crash

2cda812c

11 6月, 2016 4 次提交

The GPDB Vmem is the lowest layer of memory allocator that supports higher... · 42ca3506

由 Foyzur Rahman 提交于 6月 10, 2016

The GPDB Vmem is the lowest layer of memory allocator that supports higher allocators such as AllocSet. This layer (mostly defined in memprot.c) is in charge of actually calling malloc/realloc/free to allocate/reallocate/free memory. In this process this layer is also in charge of reserving "virtual" memory or Vmem, which is a GPDB specific shared memory counter to track per-segment combined allocations across all the GPDB processes under Vmem umbrella. The Vmem counter is managed by a separate module Vmem_Tracker, and the memprot functions (such as gp_malloc, gp_free2 and gp_realloc) call the APIs provided by VmemTracker.

Previously the memprot allocators (gp_alloc/gp_realloc/gp_free) were only allocating/freeing memory but were not adding any additional metadata in the header (and there was no header) to track the size of allocations. Therefore, there was no gp_free as freeing memory requires the size of the free to adjust Vmem counter inside VmemTracker. This was patched by explicitly passing size info in gp_free2.

In this PR we do the following:

* We add allocation size in Vmem header (along with checksums which are only available in debug build to detect header and footer boundary, and buffer overruns).

* We remove size information from the block header of AllocSet.

* We rename gp_free2 to gp_free as the second parameter (size information) is now obtained from the header and therefore no longer necessary

* We modify all the consumers of memprot.c APIs to use the new APIs

* We add unit tests to test the metadata and the correctness of the new Vmem allocators

This is the first step to integrate external modules and third party allocations with Vmem. A long running issue in GPDB is its inability to track allocations by external components including libraries such as ORCA. Therefore, the central Vmem counter is often way off from the underlying allocations, and this may run the system out of memory. By maintaining the size information in the Vmem header, we now have a self-contained allocator that can be exposed to external allocators such as GPOS allocators, without forcing them to manage size information separately.

This fixes #117269929.
Signed-off-by: NMarc Spehlmann <marc.spehlmann@gmail.com>

42ca3506

[#120984085] Adds GUC for array expansion. · 53230187

由 Jesse Zhang and Marc Spehlmann 提交于 6月 07, 2016

This GUC will be used to control the MEMO size as well as optimization
time for large IN list or large array comparison expressions.

Only the Array with less number of elements than the GUC will be
expanded and participate in constraint derivation.

Trade-off of using this GUC is loss of potential benefits from the
constraint derivation (e.g. conflict detection, partition elimination)
with shorter optimization time and less memory utilization.

53230187

N

Reset g_dataSourceCtx and close g_dataSource on the end of transaction · 89340690
由 Nikos Armenatzoglou 提交于 6月 10, 2016

89340690

Revert "Reset g_dataSourceCtx variable, since the context is destroyed at EOX." · b918bb17

由 Nikos Armenatzoglou 提交于 6月 10, 2016

This reverts commit d9edb869.
g_dataSourceCtx should not be reset in AtAbort_ExtTables, since external tables is not the only component that uses it.

b918bb17

10 6月, 2016 2 次提交

Add check for backup directory existence to gprecoverseg · 5b853b3a

由 Jamie McAtamney 提交于 6月 09, 2016

A previous commit added the capability to gprecoverseg to copy all
backup files from the mirror to the primary during a full recovery,
but assumed that the backup directory would always exist and so
gprecoverseg -F would fail if it was not present.  This commit adds
a check to ensure that gprecoverseg -F will finish successfully if
there is no backup directory.

5b853b3a

Update minirepro and pg_dump utility to dump both relation and function ddl · 318b86af

由 Haisheng Yuan 提交于 4月 22, 2016

This patch updates minirepro utility to support the following functions:
1. Dump ddl and stats for multiple queries in a query file
2. Dump ddl of relation that is used in CTE
3. Dump ddl of function that is used in the query
4. Add 2 options: relation-oids and function-oids into pg_dump command line tool

318b86af

09 6月, 2016 4 次提交

Simplify counting of tupletable slots, by getting rid of the counting. · 10ca88b4

由 Heikki Linnakangas 提交于 6月 09, 2016

Backport this patch from PostgreSQL 9.0, which replaces the tuple table
array with a linked list of individually palloc'd slots. With that, we
don't need to know the size of the array beforehand, and don't need to
count the slots. The counting was especially funky for subplans in GPDB,
and it was about to change with the upcoming PostgreSQL 8.3 merge again.
This makes it a lot simpler.

I don't plan to backport the follow-up patch to remove the ExecCountSlots
infrastructure. We'll get that later, when we merge with PostgreSQL 9.0.

commit f92e8a4b
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Sep 27 20:09:58 2009 +0000

    Replace the array-style TupleTable data structure with a simple List of
    TupleTableSlot nodes.  This eliminates the need to count in advance
    how many Slots will be needed, which seems more than worth the small
    increase in the amount of palloc traffic during executor startup.

    The ExecCountSlots infrastructure is now all dead code, but I'll remove it
    in a separate commit for clarity.

    Per a comment from Robert Haas.

10ca88b4

Change the way temp schemas are created. · 8a2cf323

由 Heikki Linnakangas 提交于 6月 09, 2016

This reverts much of the changes vs. upstream, related to temp schema
creation. Instead of using the normal CREATE SCHEMA processing to also
create the temporary schema, let InitTempTableNameSpace() to do that
like in the upstream. But in addition to creating the the temp schema
locally, it dispatches a special CreateSchemaStmt command to the
executor nodes, which instructs the executor nodes to also call
InitTempTableNameSpace().

8a2cf323

Bump check on regression database size to 5 GB. · 2ba36ba6

由 Heikki Linnakangas 提交于 6月 08, 2016

The regression database has grown over time, so that it's just above the
1 GB size that the regression test used as a "sanity check". I think the
new zlib regression test broke the camel's back. Bump it up to 5 GB, giving
us about 4 GB of headroom to grow.

2ba36ba6

S

Add unittests in the build step of GPDB in concourse. (#831) · c19b3a05
由 Shreedhar Hardikar 提交于 6月 08, 2016

c19b3a05

08 6月, 2016 4 次提交

Function to rebuild free tid list. · 00ac15f6

由 Asim R P 提交于 5月 04, 2016

If found inconsistent, the free tid list will be rebuilt automatically during
recovery.  During normal operation, super user may invoke the function
gp_persistent_freelist_rebuild(OID) to rebuild the free list.

A basic test case is added to verify sanity of a free tid list rebuilt using
the function.

00ac15f6

Formatting changes. · c4f4e27b

由 Asim R P 提交于 4月 27, 2016

Break long elog messages into multiple lines, remove trailing
whitespace and start elog messages with lower case.

c4f4e27b

N

Fix broken zlib unittests · 3d22af62
由 Nikos Armenatzoglou 提交于 6月 07, 2016

3d22af62
K
Fix parser to understand case / when expression in group by · c32cf0e6
由 Karthikeyan Jambu Rajaraman 提交于 6月 01, 2016
```
This closes #815
```
c32cf0e6

07 6月, 2016 17 次提交

Don't call ExecAssignScanProjectionInfo while in partitionMemoryContext. · d7662fd7

由 Heikki Linnakangas 提交于 6月 07, 2016

This allows removing the weird pfree() of the resultTupleSlot's tuple
descriptor. What would've happened without the pfree() is that the old
slot was allocated in the first ExecAssignScanProjectionInfo() call, in
partitionMemoryContext, and then immediately destroyed when the memory
context was reset. The second call to ExecAssignScanProjectionInfo() tries
to free the slot, again, causing the segfault. But we can avoid that
by this rearrangement of the calls in a cleaner way.

In the passing, clean up the code a bit. I found having separate variables,
indexState and scanState, which point to the same struct, to be confusing.

d7662fd7

Fix out-of-bounds writes to scanTupleSlot · a33699bc

由 Heikki Linnakangas 提交于 6月 07, 2016

ss_ScanTupleSlot is not an array, it's a single slot. The slot is allocated
from a bigger array, however, so this trampled over some other slot that was
allocated right after the scan slot. This has apparently been harmless, as
no-one's noticed, but it's surely wrong.

I bumped into this in the PostgreSQL 8.3 merge branch, where I had changed
the way the slots are allocated so that they're not stored in one big array
anymore. This bug led to segfaults in that case.

a33699bc

Remove limitation that a default must be provided for ADD COLUMN on AO table · 455f0e19

由 Heikki Linnakangas 提交于 6月 07, 2016

We had later added code that allows "DEFAULT NULL", by doing a table
rewrite. DEFAULT NULL is really the same as no default, so we might as well
do a table rewrite for that case too, and save the code needed to handle
them differently.

455f0e19

Honour PLAIN storage also when constructing a memtuple. · 625ffa82

由 Heikki Linnakangas 提交于 6月 07, 2016

A function might legitimately assume that an argument's varlen datum always
has a 4-byte header, if the datatype is marked as 'plain', and crash if we
then pass it a datum with 1-byte header, because it was packed in a
memtuple. I bumped into this while working on the PostgreSQL 8.3 merge,
because the merge brought us one such function: ts_rewrite().

625ffa82

Remove duplicated bitgetpage() function. · 77659037

由 Heikki Linnakangas 提交于 6月 07, 2016

The one in nodeBitmapHeapScan.c is what the upstream has. The one in
execBitmapHeapScan.c was copied from it in GPDB. No need for the
duplication.

It's not clear to me why we have the execBitmapHeapScan.c file at all,
why not just use all the functions in nodeBitmapHeapScan.c. But I'll
leave investigating that for another day.

77659037

Move the check for gp_mapreduce, to reduce diff vs. upstream. · ff881c1a

由 Heikki Linnakangas 提交于 6月 07, 2016

This has the effect that gpmapreduce is exempt from logging also when it
uses the extended query protocol (we only to only perform the checks in
exec_simple_query()). That makes no difference in practice, though, because
gpmapreduce doesn't actually use the extended query protocol.

ff881c1a

Remove instr_time definition in favor of portability/instr_time.h · e4b43c18

由 Daniel Gustafsson 提交于 4月 29, 2016

The redefinitions broke the win32 build on Pulse. This was part of
3bc25384 which was backported from
upstream but left this in. This makes the win32 Pulse build green.

e4b43c18

Arrange CI CONCURRENTLY so that index is created in master first. · 07d0b25c

由 Heikki Linnakangas 提交于 6月 07, 2016

This isn't strictly necessary, but makes the code easier to read, IMHO.
And it seems like a very useful property that the index can only exist in
the segments, if it exists in the master.

07d0b25c

H

Misc whitespace fixes · 05d56dee
由 Heikki Linnakangas 提交于 6月 07, 2016

05d56dee

Don't decompress datums before sending them over the network. · b1aa03e4

由 Heikki Linnakangas 提交于 6月 07, 2016

It seems better to send in compressed form, and decompress in the receiver,
to reduce network I/O. The amount of CPU work required for the
decompression is the same whether its done in the sender or the receiver.
There was even a comment implying that, but for some reason, we didn't do
it that way.

This is in preparation for the PostgreSQL 8.3 merge: The HEAP_COMPRESSED
(and HEAP_HASEXTENDED which included it) flag was removed in PostgreSQL
8.3.

b1aa03e4

Split bitmap-index related tests a separate test, from create_index. · e8a49a3e

由 Heikki Linnakangas 提交于 6月 07, 2016

Merging is easier, when the upstream regression test files (create_index)
are as unmodified as possible, and any GPDB added tests are added to
separate files (bitmap_index). Also, the only planner vs. ORCA differences
in output were in the GPDB-added parts of this, so this allows removing
the ORCA alternative expected output file for the create_index test.

e8a49a3e

Reset g_dataSourceCtx variable, since the context is destroyed at EOX. · d9edb869

由 Heikki Linnakangas 提交于 6月 07, 2016

Commit dc87e717 moved DataSourceContext to TopTransactionContext, so it
doesn't need to be explicitly destroyed at end of transaction anymore. But
the pointer to it still needs to be reset.

Fixes failures in the external_table regression test.

d9edb869

H
Initialize SortBy.node correctly. · 2d7cc67a
由 Heikki Linnakangas 提交于 6月 07, 2016
```
Per report and analysis by Asim R P.
```
2d7cc67a

New Python function to start a process using Popen · 1e4947a2

由 Asim R P 提交于 6月 03, 2016

The existing run() function waits for the process to finish. On many
occasions, e.g. transaction management tests, it is desirable to leave the
process running in background and check its status later.

1e4947a2

F
Fixing unit tests because of gcc proprietary extensions (#120725493). · a910c040
由 Foyzur Rahman 提交于 6月 06, 2016
```
Signed-off-by: NMarc Spehlmann <marc.spehlmann@gmail.com>
```
a910c040
C
Add behave test for foreign key check in gpcheckcat · 8814e6af
由 Chumki Roy 提交于 6月 03, 2016
```
Signed-off-by: NZak Auerbach <zauerbach@pivotal.io>
```
8814e6af
H

Fix another neglected alternative expected ouput file. · eceeb696
由 Heikki Linnakangas 提交于 6月 06, 2016

eceeb696

06 6月, 2016 5 次提交

H
Fix ORCA alternative expected outputs. · d4506033
由 Heikki Linnakangas 提交于 6月 06, 2016
```
I neglected these in previous commit.
```
d4506033

Backport from PostgreSQL 8.4 · 78b0a42e

由 Heikki Linnakangas 提交于 6月 06, 2016

This is a partial backport of a larger body of work which also already have
been partially backported.

Remove the GPDB-specific "breadcrumbs" mechanism from the parser. It is
made obsolete by the upstream mechanism. We lose context information from
a few errors, which is unfortunate, but seems acceptable. Upstream doesn't
have context information for those errors either.

The backport was originally done by Daniel Gustafsson, on top of the
PostgreSQL 8.3 merge. I tweaked it to apply it to master, before the
merge.

Upstream commit:

  commit b153c092
  Author: Tom Lane <tgl@sss.pgh.pa.us>
  Date:   Mon Sep 1 20:42:46 2008 +0000

    Add a bunch of new error location reports to parse-analysis error messages.
    There are still some weak spots around JOIN USING and relation alias lists,
    but most errors reported within backend/parser/ now have locations.

78b0a42e

Revert contain_vars_of_level() to match the upstream. · 756241e2

由 Heikki Linnakangas 提交于 6月 04, 2016

The function was rewritten in GPDB, and its behaviour was changed to also
return 'true' if the expression contains an Aggref of the given level.
That change in behaviour was made back in 2006, as part of a commit
containing a lot of subquery optimization changes. I could not find an
explanation for that particular change, and all the regression tests pass
without so I assume that it has become obsolete at some point over they
years.

This smoothens the way for future merges with upstream, by reducing the
diff in both code and behaviour. Also, you get a more accurate error
message in a few cases, as seen by the changes to expected output.

756241e2

Add intarray contrib module from upstream PostgreSQL 8.3 · 26330731

由 Daniel Gustafsson 提交于 5月 18, 2016

This brings in the intarray contrib module from 8.3 with a few
minor changes.

   * The Gin operator class is removed from the installation since
     Gin indexes are currently turned off in GPDB
   * A few compiler warnings are silenced
   * fdf2dbda which contains many
     bug-fixes is cherry-picked from upstream
   * A bug in bench.pl for printing the query plan is fixed
   * gppkg files are provided in package/

26330731

Add array_contains_nulls() function in arrayfuncs.c. · e85d268f

由 Tom Lane 提交于 1月 08, 2011

This will support fixing contrib/intarray (and probably other places)
so that they don't have to fail on arrays that contain a null bitmap
but no live null entries.

e85d268f

05 6月, 2016 2 次提交

D
Add missing headerfile for TempNamespaceOidIsValid() · 34915231
由 Daniel Gustafsson 提交于 6月 04, 2016
```
Fixes compiler warning for implicit function declaration in proc.c
```
34915231

Use a format string for errmsg() of literal · c7008ca8

由 Daniel Gustafsson 提交于 6月 03, 2016

Using the string literal in errmsg directly without a format string
triggers warnings in clang under -Wformat-security. Use a simple
format string containing just a reference to the string literal to
avoid warnings and possibly hiding bugs.

c7008ca8