提交 · e92a82d056e286f56a31eed9241ff38910f07f08 · Greenplum / Gpdb

29 9月, 2018 3 次提交

Use WITH syntax for options in create tablespace. (#5877) · e92a82d0

由 Paul Guo 提交于 9月 29, 2018

PG9.4 starts to allow the WITH syntax to support options
in create tablespace. Greenplum previously used the OPTIONS
syntax to support per segment location. Let's union them all to
use the WITH syntax only, following upstream.

Note the greenplum specific OPTIONS exists in gpdb master only.

e92a82d0

Correctly account for OptimizerOutstandingMemoryBalance · cd13f364

由 Taylor Vesely 提交于 9月 24, 2018

Because the optimizer has its own memory management system, and does not
make use of AllocationSets, the only way we know how much memory the
optimizer is using is by intercepting the calls to malloc() and free().
When the GUC 'optimizer_use_gpdb_allocators' is set to 'true' Orca will
replace its native  alloc() and free() methods with Ext_OptimizerAlloc()
and Ext_OptimizerFree(). These calls will track the total memory usage
in the active 'Optimizer' memory account, and the total outstanding
memory between queries in OptimizerOutstandingMemoryBalance.

This is a problem when accounting for ORCA memory in the
'X_NestedExecutor' account, because unless you add the
OptimizerOutstandingMemoryBalance ORCA can free memory that was
allocated in a previous query and underflow the account.

Both in order to get an accurate idea of how much memory the optimizer
is using, and to prevent problems with the 'X_NestedExecutor' account,
it makes sense to track the ORCA's memory usage in a single account.
Therefore, only create one 'Optimizer' account per query, no matter how
many times we call it.
Co-authored-by: NDavid Kimura <dkimura@pivotal.io>

cd13f364

Add DEBUG mode to the explain_memory_verbosity GUC · 21f8a491

由 Taylor Vesely 提交于 8月 24, 2018

The memory accounting system generates a new memory account for every
execution node initialized in ExecInitNode. The address to these memory
accounts is stored in the shortLivingMemoryAccountArray. If the memory
allocated for shortLivingMemoryAccountArray is full, we will repalloc
the array with double the number of available entries.

After creating approximately 67000000 memory accounts, it will need to
allocate more than 1GB of memory to increase the array size, and throw
an ERROR, canceling the running query.

PL/pgSQL and SQL functions will create new executors/plan nodes that
must be tracked my the memory accounting system. This level of detail is
not necessary for tracking memory leaks, and creating a separate memory
account for every executor will use large amount of memory just to track
these memory accounts.

Instead of tracking millions of individual memory accounts, we
consolidate any child executor account into a special 'X_NestedExecutor'
account. If explain_memory_verbosity is set to 'detailed' and below,
consolidate all child executors into this account.

If more detail is needed for debugging, set explain_memory_verbosity to
'debug', where, as was the previous behavior, every executor will be
assigned its own MemoryAccountId.

Originally we tried to remove nested execution accounts after they
finish executing, but rolling over those accounts into a
'X_NestedExecutor' account was impracticable to accomplish without the
possibility of a future regression.

If any accounts are created between nested executors that are not rolled
over to an 'X_NestedExecutor' account, recording which accounts are
rolled over can grow in the same way that the
shortLivingMemoryAccountArray is growing today, and would also grow too
large to reasonably fit in memory.

If we were to iterate through the SharedHeaders every time that we
finish a nested executor, it is not likely to be very performant.

While we were at it, convert some of the convenience macros dealing with
memory accounting for executor / planner node into functions, and move
them out of memory accounting header files into the sole callers'
compilation units.
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>
Co-authored-by: NAdam Berlin <aberlin@pivotal.io>
Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>

21f8a491

28 9月, 2018 13 次提交

Order active window clauses for greater reuse of Sort nodes. · 3f0d46f7

由 Daniel Gustafsson 提交于 9月 28, 2018

This is a backport of the below commit from postgres 12dev, which in turn
is a patch that was influenced by an optimization from the previous version
of the Greenplum Window code. The idea is to order the Sort nodes based on
sort prefixes, such that sorts can be reused by subsequent nodes.

As this uses EXPLAIN in the test output, a new expected file is added for
ORCA output even though the patch only touches the postgres planner.

commit 728202b6
Author: Andrew Gierth <rhodiumtoad@postgresql.org>
Date: Fri Sep 14 17:35:42 2018 +0100

Order active window clauses for greater reuse of Sort nodes.

By sorting the active window list lexicographically by the sort clause
list but putting longer clauses before shorter prefixes, we generate
more chances to elide Sort nodes when building the path.

Author: Daniel Gustafsson (with some editorialization by me)
Reviewed-by: Alexander Kuzmenkov, Masahiko Sawada, Tom Lane
Discussion: https://postgr.es/m/124A7F69-84CD-435B-BA0E-2695BE21E5C2%40yesql.se

3f0d46f7

Fix syntax errors in testsuites that aren't on purpose · 3f6273e1

由 Daniel Gustafsson 提交于 9月 28, 2018

There were a few cases of broken queries in the test suites which
weren't done on purpose in order to test the parser/grammar. This
fixes the ones that stood out, but there are likely to be more in
ignore blocks that slip through the cracks.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

3f6273e1

Remove unnecessary code for the first ORDER BY column in window agg. · e70f73e0

由 Heikki Linnakangas 提交于 9月 28, 2018

The purpose of this code was to treat the first ORDER BY column, in a
window agg like "ROW_NUMBER() OVER (ORDER BY x RANGE BETWEEN 2 PRECEDING
AND 2 FOLLOWING", the same way as volatile expressions, and add them to
the target list as is. That was to ensure that it would be available for
computing the window bounds. But upstream commit a2099360, merged as
part of the 9.3 merge, got rid of the distinction between volatile and
non-volatile expressions, so we no longer need to treat the first ORDER BY
column any different either.

e70f73e0

Fix mixup between bitmap index's int48 and int42 support functions. · e628194e

由 Heikki Linnakangas 提交于 9月 28, 2018

These were swapped. It's been wrong ever since we merged the operator
family patch, during the 8.3 merge. But apparently it wasn't causing any
ill effect, or at least I was not able to find a case that would fail
because of it.

This was caught by new sanity checks in the 'opr_sanity' regression
test, introduced in the upcoming 9.4 merge.

e628194e

Fix the one-element tuple visibility cache in heap scans, for multi-xids. · fec5f0b5

由 Heikki Linnakangas 提交于 9月 28, 2018

It's not cool to use the raw xmax value as part of the cache key. If the
raw xmax represents a multi-xid, the real deleter XID would be something
else. We could get fooled, if we cached a multi-XID value, and later saw
a tuple with a regular xmax, with the same numerical value as the cached
multi-XID.

I think this was actually broken before the 9.3 merge already. If a
transaction locked a tuple, and deleted another tuple, and a concurrent
scan sees the locked tuple first, it might think that the deleted tuple
is also visible to it, because it has the same xmin+xmax combination as
the locked tuple.

fec5f0b5

Move code marked with FIXME to make_windowInputTargetList(). · fa4a2ccb

由 Heikki Linnakangas 提交于 9月 28, 2018

make_windowInputTargetList() seems like a better place for this code,
as suggested by the FIXME comment that was left here in the 9.3 merge.

fa4a2ccb

Disable DISTINCT-qualified aggregates with ORCA. · 386d5423

由 Heikki Linnakangas 提交于 9月 28, 2018

The new regression tests revealed that it doesn't work. With an assertion-
enabled ORCA build, I got an assertion failure like this:

2018-08-23 11:20:08:371479 EEST,THD000,ERROR,""/home/heikki/gpdb/optimizer-main/libgpos/include/gpos/common/CDynamicPtrArray.h:300: Failed assertion: pos < m_size && ""Out of bounds access""
Stack trace:
1 0x00007f363fb3e78a gpos::CException::Raise + 252
2 0x00007f3640be0970 gpos::CDynamicPtrArray + 84
3 0x00007f3640c93dac gpopt::CWindowPreprocessor::SplitPrjList + 1162
4 0x00007f3640c9404b gpopt::CWindowPreprocessor::SplitSeqPrj + 303
5 0x00007f3640c94b61 gpopt::CWindowPreprocessor::PexprSeqPrj2Join + 357
6 0x00007f3640c95276 gpopt::CWindowPreprocessor::PexprPreprocess + 316
7 0x00007f3640c240a2 gpopt::CExpressionPreprocessor::PexprPreprocess + 1098
8 0x00007f3640bc2d62 gpopt::CQueryContext::CQueryContext + 696
9 0x00007f3640bc36df gpopt::CQueryContext::PqcGenerate + 1413
10 0x00007f3640c95d86 gpopt::COptimizer::PdxlnOptimize + 1042
11 0x000055b2e8252f26 COptTasks::OptimizeTask + 1488
12 0x00007f363fb58a0d gpos::CTask::Execute + 183
13 0x00007f363fb5d447 gpos::CWorker::Execute + 199
14 0x00007f363fb56d77 gpos::CAutoTaskProxy::Execute + 287
15 0x00007f363fb3479b gpos_exec + 800
"",",,,,,,"explain select dt, pn, sum(distinct pn) over (partition by dt), sum(pn) over (partition by dt) from sale;",0,,"COptTasks.cpp",545,
2018-08-23 11:20:08.372392 EEST,"heikki","postgres",p19807,th1163394560,"[local]",,2018-08-23 11:19:53 EEST,0,con4,cmd7,seg-1,,dx6,,sx1,"LOG","00000","Planner produced plan :1",,,,,,"explain select dt, pn, sum(distinct pn) over (partition by dt), sum(pn) over (partition by dt) from sale;",0,,"orca.c",61,

This caused the query to fall back to planner, which worked. But with
assertions disabled, it crashed instead.

We should fix ORCA to deal with that. One option is to rip out all the
special code to plan DISTINCT-qualified aggregates in ORCA, and just pass
through the windistinct flag to the executor. That's basically what the
Postgres planner does, and the executor will deal with deduplicating the
input. But for now, let's just stop the crashing.

386d5423

Reimplement DISTINCT for window aggregates. · 6523b432

由 Heikki Linnakangas 提交于 9月 28, 2018

GPDB 5 supported DISTINCT in window aggregates, e.g:

COUNT(DISTINCT x) OVER (PARTITION BY y)

However, PostgreSQL does not support that, and as a result, GPDB lost that
capability as part of the window functions rewrite, too. In the upstream,
there's an explicit check for that, that it was lost in the window function
rewrite. So the parser accepted that, but it was executed just like if
there was no DISTINCT. There were also no tests for this, that would return
a different result with the DISTINCT than without, which is why no-one
noticed it.

To fix, implement the DISTINCT support, to the same extent that the old
implementation supported it. The new implementation adds a little sort +
deduplicate step for each DISTINCT aggregate. I'm not sure how this
compares with the old implementation, performance-wise, but at least it
works now.

Also, add the missing tests.

6523b432

Fix the interval issue when Seting up TCP interconnect · 3c08295f

由 Pengzhou Tang 提交于 9月 27, 2018

When TCP connections cannot be setup for a long time, we check
if some segments are already failed out, the check is an high-cost
operation, so we set the interval to 2 seconds. We used to use a
counter to record the interval which is not reliable because
a loop cycle (500ms) may be interrupted earlier due to
EINT/EAGAIN of select().

To not affect the setup performance of TCP interconnect, we need
to make the interval mechanism more reliable.

3c08295f

Allow tables to be distributed on a subset of segments · 4eb65a53

由 ZhangJackey 提交于 9月 28, 2018

There was an assumption in gpdb that a table's data is always
distributed on all segments, however this is not always true for example
when a cluster is expanded from M segments to N (N > M) all the tables
are still on M segments, to workaround the problem we used to have to
alter all the hash distributed tables to randomly distributed to get
correct query results, at the cost of bad performance.

Now we support table data to be distributed on a subset of segments.

A new columne `numsegments` is added to catalog table
`gp_distribution_policy` to record how many segments a table's data is
distributed on.  By doing so we could allow DMLs on M tables, joins
between M and N tables are also supported.

```sql
-- t1 and t2 are both distributed on (c1, c2),
-- one on 1 segments, the other on 2 segments
select localoid::regclass, attrnums, policytype, numsegments
    from gp_distribution_policy;
 localoid | attrnums | policytype | numsegments
----------+----------+------------+-------------
 t1       | {1,2}    | p          |           1
 t2       | {1,2}    | p          |           2
(2 rows)

-- t1 and t1 have exactly the same distribution policy,
-- join locally
explain select * from t1 a join t1 b using (c1, c2);
                   QUERY PLAN
------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)
   ->  Hash Join
         Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
         ->  Seq Scan on t1 a
         ->  Hash
               ->  Seq Scan on t1 b
 Optimizer: legacy query optimizer

-- t1 and t2 are both distributed on (c1, c2),
-- but as they have different numsegments,
-- one has to be redistributed
explain select * from t1 a join t2 b using (c1, c2);
                          QUERY PLAN
------------------------------------------------------------------
 Gather Motion 1:1  (slice2; segments: 1)
   ->  Hash Join
         Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
         ->  Seq Scan on t1 a
         ->  Hash
               ->  Redistribute Motion 2:1  (slice1; segments: 2)
                     Hash Key: b.c1, b.c2
                     ->  Seq Scan on t2 b
 Optimizer: legacy query optimizer
```

4eb65a53

M

docs - update optimizer_array_expansion_threshold default to 100 · db9a2ec4
由 mkiyama 提交于 9月 27, 2018

db9a2ec4
M
docs - update boostfs install instructions. (#5859) · f932f7ee
由 Mel Kiyama 提交于 9月 27, 2018
```
-How to find the boostfs config. guide
-How to find the boostfs RPM
```
f932f7ee
L
docs - sql and catalog ref page updates for i/o conversion casts (#5812) · 97e7d520
由 Lisa Owen 提交于 9月 27, 2018
```
* docs - sql and catalog ref page updates for i/o conversion casts

* address comments from heikki
```
97e7d520

27 9月, 2018 12 次提交

Update the generated pipeline file · c2759ad7

由 Joao Pereira 提交于 9月 27, 2018

The commit 7605710c did not update the yml file with the pipeline
configuration for master.

c2759ad7

Change the way quicklz is compiled and used in CI · 7605710c

由 David Kimura 提交于 9月 13, 2018

- Due to changes in the structure of gpaddon we can no longer use the
resource gpaddon_src to compile 5X for Binary Swap Jobs.
- From this point on we should use the 5X_RELEASE tag on gpaddon to
compile Greenplum for these jobs.
- Change the expected quicklz error message while building OSS Greenplum
- Explicitly add Greenplum bin folder to the path
- Add back the rsync of the quicklz addon folder
  This was added back to ensure that the enterpreize build still works
correctly
- Use the correct branch to compile Binary Swap version

Ensure that quicklz is not build for windows

We will not support, at this time, the compilation of quicklz and
installation for our windows built
Signed-off-by: NJoao Pereira <jdealmeidapereira@pivotal.io>

7605710c

Remove built-in stub functions for QuickLZ compressor. · 589533be

由 Heikki Linnakangas 提交于 10月 17, 2017

The proprietary build can install them as normal C language functions,
with CREATE FUNCTION, instead.

In the passing, remove some unused QuickLZ debugging GUCs.

This doesn't yet get rid of all references to QuickLZ, unfortunately. The
GUC and reloption validation code still needs to know about it, so that
they can validate the options read from postgresql.conf, when starting up
postmaster. For the same reason, you cannot yet add custom compression
algorithms, besides quicklz, as an extension. But this is another step in
the right direction, anyway.
Co-authored-by: NJimmy Yih <jyih@pivotal.io>
Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>

589533be

Rename variable to avoid risk of type collision · 5da162c4

由 Daniel Gustafsson 提交于 9月 27, 2018

The comment states that "small" might be defined by socket.h, and while
thats not true for all versions of sys/socket.h, it's still not a good
name to use as it's common in Windows headers (should we ever revive a
Windows port). Renaming to a non-colliding name is a small price to pay
to avoid subtle bugs, so rename and remove the preprocessor dance.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

5da162c4

Refactor qp_functions test suite · 8f770d6d

由 Daniel Gustafsson 提交于 9月 27, 2018

The test suite, which was ported over from TINC, was ignoring so much of
the memorized output that it more or less didn't test anything (and the
ignored blocks was as full of outdated output as one would imagine). The
code was also formatted in weird ways and had needless NOTICEs thrown
during execution.

This refactors the testsuite to remove all ignore blocks, removes some
utterly pointless tests (there are many more of them left), formats the
code to be readable, fixes the output to work and removes some duplicate
tests.

The remaining bits of the suite is by no means terribly interestering,
but it runs fast enough that it's worth keeping the leftovers for now.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

8f770d6d

Remove compile_gpdb_windows_cl job from master pipeline (#5804) · b0f124b5

由 Peifeng Qiu 提交于 9月 27, 2018

Upstream has upgraded windows compile script and use newer version
of Perl. This may block current merging effort. We plan to do
windows native compiling for gpdb 6 so this job is no longer
necessary for gpdb_master.

b0f124b5

Dispatcher can create flexible size gang (#5701) · a3ddac06

由 Tang Pengzhou 提交于 9月 27, 2018

* change type of db_descriptors to SegmentDatabaseDescriptor **

A new gang definination may consist of cached segdbDesc and new
created segdbDesc, there is no need to palloc all segdbDesc struct
as new.

* Remove unnecessary allocate gang unit test

* Manage idle segment dbs using CdbComponentDatabases instead of available* lists.

To support vary size gang, we now need to manage segment dbs in a lower
granularity, previously, idle QEs is managed by a bunch of lists like
availablePrimaryWriterGang, availableReaderGangsN, this restrict
dispatcher to only create N-size (N = number of segments) or 1-size
gang.

CdbComponentDatabases is a snapshot of segment components within current
cluster, now it maintains a freelist for each segment component. When
creating gang, dispatcher will make up a gang from each segment
component (from freelist or create a new segment db). When cleaning up
a gang, dispatcher will return idle segment dbs to each segment
component.

CdbComponentDatabases provide a few functions to manipulate segment dbs
(SegmentDatabaseDescriptor *):
* cdbcomponent_getCdbComponents
* cdbcomponent_destroyCdbComponents
* cdbcomponent_allocateIdleSegdb
* cdbcomponent_recycleIdleSegdb
* cdbcomponent_cleanupIdleSegdbs

CdbComponentDatabases is also FTS version sensitive, so once a FTS
version changed, CdbComponentDatabases destroy all idle segment dbs
and allocate QEs in the new promoted segment. This provides the ability
to transparent mirror failover to users.

Since segment dbs(SegmentDatabaseDescriptor *) are managed by
CdbComponentDatabases now, we can simplify the memory context
management by replacing GangContext & perGangContext with
DispatcherContext & CdbComponentsContext.

* Postpone the error hanlding when creating gang

Now we have AtAbort_DispatcherState, one advantage of it is that
we can postpone gang error hanlding in this function and make
code cleaner.

* Handle FTS version change correctly

In some cases, when a FTS version changed, we can't update current
snapshot of segment components, to be more specifically, we can't
destroy current writer segment dbs and create new segment dbs.

These cases include:
* session has temp table created.
* query need two-phase commit and gxid has been dispatched to
  segments.

* Replace <gangId, sliceId> map with <qeIdentifier, sliceId> map

We used to dispatch a <gangId, sliceId> map along with query to
segment dbs so segment dbs can know which slice they should
execute.

Now gangId is useless for a segment db because a segment db can
be reused by different gang, so we need a new way to tell the
info to segment dbs. To resolve this, CdbComponentDatabases
assign a unique identifier to each segment db and make up a
bitmap set which consist of segment identifiers for each slice,
segment dbs then can go through the slice table and find the
right slice to execute.

* Allow dispatcher to create vary size gang and refine AssignGangs()

Previously, dispatcher can only create N-size gang for
GANGTYPE_PRIMARY_WRITER or GANGTYPE_PRIMARY_READER. this
restrict dispatcher in many ways, one example is direct
dispatch, it always create a N-size gang even it only
dispatch the command to one segment, another example is
some operations may be able to use N+ size gang, like
hash join, if both inner and outer plan is redistributed,
the hash join node can associate with a N+ size gang to
execute. This commit changes the API of createGang() so the
caller can specify a list of segments (partial or even
duplicate segments), CdbCompoentDatabase will guarantee
each segment has only one writer in a session. With this
it also resolves another pain point of AssignGangs(), so
the caller don't need to promote a GANGTYPE_PRIMARY_READER
to GANGTYPE_PRIMARY_WRITER, or promote a GANGTYPE_SINGLETON
_READER to GANGTYPE_PRIMARY_WRITER for replicated table
(see FinalizeSliceTree()).

With this commit, AssignGang() is very clear now.

a3ddac06

Remove remove_subquery_in_RTEs() call in standard_planner() (#5863) · 69cd1ec5

由 Paul Guo 提交于 9月 27, 2018

As the comment said, this was useful howerver now that we have
upstream add_rte_to_flat_rtable() to handle that, let's remove
this call.

69cd1ec5

Move pxf-infra to new consolidated repo · abccfe61

由 Divya Bhargov 提交于 9月 26, 2018

Co-authored-by: NDivya Bhargov <dbhargov@pivotal.io>
Co-authored-by: NLav Jain <ljain@pivotal.io>

abccfe61

Remove unused variable · cc853420

由 Daniel Gustafsson 提交于 9月 26, 2018

Fixes clang (and probably gcc) compiler warning on unused variable.
Reviewed-by: NPaul Guo <pguo@pivotal.io>
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>

cc853420

Increase default value of wal_keep_segments GUC. · dd18c4a0

由 David Kimura 提交于 9月 26, 2018

Until we have replication slots this will keep enough xlog segments
around so that mirrors have an opportunity to reconnect when a
checkpoint removes a segment while the mirror is not streaming.
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>

dd18c4a0

Mark various objects as internal, for purposes of object access hooks. · a673ddaa

由 Heikki Linnakangas 提交于 9月 26, 2018

As far as I can see, the 'is_internal' flag is passed through to possible
object access hook, but it has no other effect. Mark the LOV index and
heap created for bitmap indexes, as well as constrains created for
exchanged partitions as 'internal'.

a673ddaa

26 9月, 2018 12 次提交

Remove FIXME around setting xlogid/xrecoff at bootstrapping. · e5d8a6b4

由 Heikki Linnakangas 提交于 9月 26, 2018

I'm not entirely sure what was going on here before. I suspect we had
backported some fixes from later upstream versions, and they caused merge
conflicts and confusion now. But in any case, I see no reason to deviate
from upstream now, so just remove the FIXME.

e5d8a6b4

Complete backport of upstream commits around lastReplayedEdnRecPtr. · 2da73cea

由 Heikki Linnakangas 提交于 9月 26, 2018

We had backported upstream commits 425bef6ee7 and 2cd72ba42d earlier, but
those got partially reverted in the 9.3 merge. Or earlier, or we hadn't
backported them completely to begin with - I didn't investigate the exact
path of how we got here. In any case, a partial backport is confusing, so
take the code around this from the tip of 9.3 stable, so that we have both
of those commits fully backported.

2da73cea

A

Adds an intention explaining method for if an aoentry is in use. · bb574831
由 Adam Berlin 提交于 9月 25, 2018

bb574831
A

Reduce duplicate knowledge of lightweight locks within appendonlywriter. · 938147e2
由 Adam Berlin 提交于 9月 24, 2018

938147e2
A

Renames function to better describe its intent. · 69cd05fe
由 Adam Berlin 提交于 9月 24, 2018

69cd05fe
A

Move logic into appendonly writer. Allow UI to decide how to respond to user. · ec1266b4
由 Adam Berlin 提交于 9月 24, 2018

ec1266b4

New gp_toolkit functions to get and remove entries from AppendOnlyHash · 1cbbe743

由 Asim R P 提交于 9月 19, 2018

The functions allow obtaining or removing entries from the shared hash
table maintained on QD. Default size of this hash table is 1000 and
entries are removed only after it is filled to capacity. The two
functions should be helpful for testing as well as troubleshooting
issues with appendonly tables in production deployments.
Co-authored-by: NJimmy Yih <jyih@pivotal.io>

1cbbe743

Fix a bug in AO segment file selection for insert/update · ee86829d

由 Asim R P 提交于 9月 18, 2018

A segment file that is compacted by vacuum is left in awaiting drop
state on QEs. Such a segment file should not be chosen for new
inserts because it will never be considered for reading during scans.
This patch fixes a bug in the logic to determine if a segment file is
in awaiting drop state. Precondition for the bug includes a specific
interleaving of vacuum and insert transactions on the same appendonly
table, manifested in the accompanying test. The fix is to use
SnapshotNow instead of MVCC snapshot. A segment file whose state is
updated to awaiting drop by a vacuum compaction transaction may still
be be seen as available for inserts through MVCC snapshot. When a
vacuum compaction transaction is in progress, the aoentry for the
relation in appendonly hash cannot be evicted and the need for
obtaining state from QEs does not arise.

ee86829d

A
Use correct attribute number in updating aoseg tuplecounts · 3b001e3e
由 Asim R P 提交于 9月 18, 2018
```
Spotted while reading.
```
3b001e3e

Validate state of an AO segment file before insertion and drop · 9b9e659c

由 Asim R P 提交于 9月 14, 2018

This commit promotes a few assertions into elog(ERROR) so as to avoid
new data being appended to a segmene file that is not in available
state.  Scans on an AO table do not read segment files that are awaiting
to be dropped.  New data, if inserted in such a segment file, will be
lost forever.

The accompanying isolation2 test demonstrates a bug that hits these
errors.  The test uses a newly added UDF to evict an entry from the
appendonly hash table.  In production, an entry is evicted when the
appendonly hash table is filled (default capacity of 1000 entries).

Note: the bug will be fixed in a separate patch.
Co-authored-by: NAdam Berlin <aberlin@pivotal.io>

9b9e659c

D

Docs: Updating function xrefs to postgres · b2e98d44
由 David Yozie 提交于 9月 25, 2018

b2e98d44
E

Update scripts to refer to gpcontrib dir for gfhdfs test · 8f8813d1
由 Ekta Khanna 提交于 9月 21, 2018

8f8813d1