提交 · 48b13271e8bf9e171a5762c20b3d91528b341c77 · Greenplum / Gpdb

17 11月, 2020 1 次提交

Avoid checking distributed snapshot for visibility checks on QD · 48b13271

由 Ashwin Agrawal 提交于 11月 16, 2020

This is partial cherry-pick from commit
b3f300b9.  In the QD, the distributed
transactions become visible at the same time as the corresponding
local ones, so we can rely on the local XIDs. This is true because the
modification of local procarray and globalXactArray are protected by
lock and hence a atomic operation during transaction commit.

We have seen many situations where catalog queries run very slow on QD
and potential reason is checking distributed logs. Process local
distributed log cache fall short for this usecase as most of XIDs are
unique and hence get frequent cache misses. Shared memory cache falls
short as only caches 8 pages and many times need many more pages to be
cached to be effective.
Co-authored-by: NHubert Zhang <hzhang@pivotal.io>
Co-authored-by: NGang Xiong <gangx@vmware.com>

48b13271

21 10月, 2020 1 次提交

The inner relation of LASJ_NOTIN should not have partition locaus · 343f8826

由 Jinbao Chen 提交于 10月 21, 2020

The result of NULL not in an unempty set is false. The result of
NULL not in an empty set is true. But if an unempty set has
partitioned locus. This set will be divided into several subsets.
Some subsets may be empty. Because NULL not in empty set equals
true. There will be some tuples that shouldn't exist in the result
set.

The patch disable the partitioned locus of inner table by removing
the join clause from the redistribution_clauses.

this commit cherry pick from 6X_STABLE 8c93db54f3d93a890493f6a6d532f841779a9188
Co-authored-by: NHubert Zhang <hubertzhang@apache.org>
Co-authored-by: NRichard Guo <riguo@pivotal.io>

343f8826

19 9月, 2020 2 次提交

Refactor query string truncation on top of · e393c88b

由 Asim R P 提交于 9月 19, 2020

Commit 889ba39e fixed the query string truncation in dispatcher to
make it locale-aware.  This patch refactors that change so as to avoid
accessing a string beyond its length.

Reviewed by: Heikki, Ning Yu and Polina Bungina

(cherry picked from commit abf6b330)

e393c88b

Fix query string truncation while dispatching to QE · b76d049b

由 Polina Bungina 提交于 9月 19, 2020

Execution of a long enough query containing multi-byte characters can cause incorrect truncation of the query string. Incorrect truncation implies an occasional cut of a multi-byte character and (with log_min_duration_statement set to 0 ) subsequent write of an invalid symbol to segment logs. Such broken character present in logs produces problems when trying to fetch logs info from gp_toolkit.__gp_log_segment_ext  table - queries fail with the following error: «ERROR: invalid byte sequence for encoding…».
This is caused by buildGpQueryString function in `cdbdisp_query.c`, which prepares query text for dispatch to QE. It does not take into account character length when truncation is necessary (text is longer than QUERY_STRING_TRUNCATE_SIZE).

(cherry picked from commit f31600e9)

b76d049b

17 9月, 2020 1 次提交

Do not read a persistent tuple after it is freed · 5f765a8e

由 Asim R P 提交于 9月 15, 2020

This bug was found in a production environment where vacuum on
gp_persistent_relation was concurrently running with a backend
performing end-of-xact filesystem operations.  And the GUC
debug_persistent_print was enabled.

The *_ReadTuple() function was called on a persistent TID after the
corresponding tuple was deleted with frozen transaction ID.  The
concurrent vacuum recycled the tuple and it led to a SIGSEGV when the
backend tried to access values from the tuple.

Fix it by avoiding the debug log message in case when the persistent
tuple is freed (transitioning to FREE state).  All other state
transitions are logged.

In absence of concurrent vacuum, things worked just fine because the
*_ReadTuple() interface reads tuples from persistent tables directly
using TID.

5f765a8e

04 9月, 2020 1 次提交
- X
  Fix unit test failure · 339a292f
  由 xiong-gang 提交于 9月 04, 2020
```
commit 4f5a2c23 breaks the unittest cdbtm_test
```
  339a292f
26 8月, 2020 1 次提交

PANIC when the shared memory is corrupted · 4f5a2c23

由 xiong-gang 提交于 8月 26, 2020

shmNumGxacts and shmGxactArray are accessed under the protection of
shmControlLock, this commit add some defensive code and PANIC at the earliest
when the shared memory is corrupted.

4f5a2c23

25 8月, 2020 1 次提交

Fix unexpected corrupt of persistent filespace table (#10623) · 424e382a

由 Tang Pengzhou 提交于 8月 25, 2020

With a segment whose primary is down and its mirror is promoted to
primary, we run gp_remove_segment_mirror to remove the mirror of
the segment, we see the mirror related fields are cleaned up in
gp_persistent_filespace_node. But when we run gp_remove_segment_mirror
for the same segment again, the primary related fields are also
cleaned up, this is wrong and not expected.

Such a case was observed in production when gprecoverseg -F was
interrupted in the middle of __updateSystemConfigRemoveAddMirror() and
run again.
Reviewed-by: NAsim R P <pasim@vmware.com>

424e382a

15 7月, 2020 1 次提交

Fix pulling up EXPR sublinks · a6ee98bf

由 Richard Guo 提交于 7月 15, 2020

Currently GPDB tries to pull up EXPR sublinks to inner joins. For query

select * from foo where foo.a >
    (select avg(bar.a) from bar where foo.b = bar.b);

GPDB would transform it to:

select * from foo inner join
    (select bar.b, avg(bar.a) as avg from bar group by bar.b) sub
on foo.b = sub.b and foo.a > sub.avg;

To do that, GPDB needs to recurse through the quals in sub-select and
extract quals of form 'outervar = innervar' and then build new
SortGroupClause items and TargetEntry items based on these quals for
sub-select.

But for quals of form 'function(outervar, innervar1) = innvervar2', GPDB
handles them incorrectly and will cause wrong results issues as
described in issue #9615.

This patch fixes this issue by treating these kinds of quals as not
compatible correlated and thus the sub-select would not be converted to
join.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NAsim R P <apraveen@pivotal.io>
(cherry picked from commit dcdc6c0b)

a6ee98bf

13 7月, 2020 1 次提交

Fix the assert failure on pullup flow in within group · 7246f370

由 Jinbao Chen 提交于 7月 13, 2020

Flow in AggNode has wrong TargetList. AggNode has a different
TargetList from its child nodes, so copying flow directly from the
child node to AggNode is completely wrong. We need to use pullupflow to
generate this TargetList in creating the within group plan with single
QE.

7246f370

24 6月, 2020 1 次提交

Fix a recursive AbortTransaction issue · 1a2454ab

由 xiong-gang 提交于 6月 24, 2020

When the error happens after ProcArrayEndTransaction, it will recurse back to
AbortTransaction, we need to make sure it will not generate extra WAL record
and not fail the assertions.

1a2454ab

17 4月, 2020 1 次提交

Fix memory leak in checkpointer process. (#9730) · 7dee0229

由 Hao Wu 提交于 4月 17, 2020

Checkpointer process is a long-live and the current memory context for
the FOR loop is its own memory context. So any memory leak will lead to
the checkpointer process hold more and more memory until the memory
context is reset.
The `rdata` to build xlog record has 5 pointers that allocated by
palloc/palloc0 and only one pointer frees the memory. It's better to
free memory here rather than reseting the memory context in the for loop
of the checkpointer process.

7dee0229

21 3月, 2020 1 次提交

(

(5X)Enable external table's error log to be persistent for ETL. (#9759) · efe41b23

由 (Jerome)Junfeng Yang 提交于 3月 21, 2020

For ETL user scenarios, there are some cases that may frequently create
and drop the same external table. And once the external table gets
dropped. All errors stored in the error log will lose.

To enable error log persistent for external with the same
"dbname"."namespace"."table".
Bring in `LOG ERRORS PERSISTENTLY` clause. Parse it to
`error_log_persistent=true` in options to avoid catalog change.
If a user creates the external table with this clause, the external's
error log will be name as "dbid_namespaceid_tablename"
under "errlogpersistent" directory.

And drop external table will ignore to delete the error log.

Create separate `gp_read_persistent_error_log` function to read
persistent error log.
If the external table gets deleted, only the namespace owner
has permission to delete the error log.

Create separate `gp_truncate_persistent_error_log` function to delete
persistent error log.
If the external table gets deleted. Only the namespace owner has
permission to delete the error log.
It also supports wildcard input to delete error logs
belong to a database or whole cluster.

If drop an external table create with `error_log_persistent`. And then
create the same "dbname"."namespace"."table" external table without
persistent error log. It'll write errors to the normal error log.
The persistent error log still exists.
Reviewed-by: NHaozhouWang <hawang@pivotal.io>
Reviewed-by: NAdam Lee <ali@pivotal.io>

efe41b23

03 2月, 2020 1 次提交

error out if FileWrite() fails in MirroredAppendOnly_Append(). · b1276803

由 Paul Guo 提交于 12月 11, 2019

It is a safer coding. Some callers of MirroredAppendOnly_Append() seems to use
Assert() to imply that FileWrite() could not be called by the code paths. I did
not carefully check their call paths but just conservatively error out in
MirroredAppendOnly_Append() directly for the wrong FileWrite() case,
considering low level io function errors are dangerous and are hard to debug.

b1276803

16 1月, 2020 2 次提交

retryAbortPrepared only if previous attempt failed. · af9e0c83

由 Ashwin Agrawal 提交于 2月 07, 2019

This was spotted by Pengzhou Tang, during code inspection. So, fixing
this, I don't think had any ill effect but definitely is unnecessary.

Cherry-picked from bfb925fa

af9e0c83

Reduce chances of master PANIC due to failure of phase 2 of 2PC. · 0fc8033d

由 Ashwin Agrawal 提交于 2月 15, 2019

This commit increases retry count and adds small delay between retries
for 2PC.

Commit-Prepared or Abort-Prepared (phase 2) of 2PC perform retries if
first attempt fails to complete the transaction. Default only 2
retries were performed and also were with zero delay. Once retries are
exhausted master PANIC's and has to continue retrying. Most of the
times the phase 2 fails on first attempt if segment is undergoing
recovery or failover happens on mirror. In such instances, just 2
retries are attempted in msecs and seems to defeat the purpose of the
retries.

Hence, modifying default number to retries to 10. Also, adding 100msec
delay between each retry to provide resonable opportunity to succeed
on retries. This should help avoid master PANICs for not able to
complete phase 2. I gave lot of thought but couldn't think of any
downsides from incresing the number of retries.

Also, maximum number allowed to be configured was only 15, which seems
too restrictive. Mainly for the tests where sometime to avoid
flakiness and avoiding master panics having higher number of retries
is helpful. So, changing the guc `dtx_phase2_retry_count` maximum
allowed to INT_MAX. Don't practically expect it to be set to any value
higher than some thousands. But think we don't have to be so
restrictive for the maximum.

Cherry-picked from f66054cd with additional and
needed test change.

0fc8033d

14 1月, 2020 1 次提交

Consider different cdbhash value in EC. · 652f8f7b

由 Hubert Zhang 提交于 1月 14, 2020

Continue fix for issue 8918 on GPDB5X

In GPDB5, items in an EC may have different cdbhash values,
which means their distributed keys may be different as well.
For example, in T.D = <constant expr>, if T.D is float4,
while <constant expr> is float8. Even if they are both 1.0,
we could not consider they could be distributed on the same segment after motion.
So cdbhash_type_compatibility_table is added to fix this issue.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NJinbao Chen <jinchen@pivotal.io>

652f8f7b

09 1月, 2020 1 次提交

Print warning message in checkNetworkTimeout. · 86ca712a

由 Hubert Zhang 提交于 1月 02, 2020

Packets will be dropped repeatly on some specific ports.
We need a way to quickly identify this issue.
But when network is bad, packets will also be dropped.
In past checkNetworkTimeout will error out when a packet
failed to receive ACK more than one hour(see GUC
gp_interconnect_transmit_timeout), which is too strict.

This commit introduces a warning message to report this
possible problem, and DBA could examine the port in further.

86ca712a

07 1月, 2020 1 次提交
- Z
  Subquery's locus should keep general · 6c9b4a7f
  由 Zhenghua Lyu 提交于 1月 07, 2020
```
If a subquery's locus is general, we should keep it general.
```
  6c9b4a7f
24 12月, 2019 1 次提交

Fix add partition relops null pointer issue. · 249a3a51

由 Zhenghua Lyu 提交于 12月 24, 2019

transformRelOptions may return a null pointer in some
cases, add the check in function `add_partition_rule`.

249a3a51

23 12月, 2019 2 次提交

Remove Motion codepath to detoast HeapTuples, convert to MemTuple instead. · e2b63204

由 Heikki Linnakangas 提交于 12月 23, 2019

The Motion sender code has four different codepaths for serializing a
tuple from the input slot:

1. Fetch MemTuple from slot, copy it out as it is.

2. Fetch MemTuple from slot, re-format it into a new MemTuple by fetching
   and inlining any toasted datums. Copy out the re-formatted MemTuple.

3. Fetch HeapTuple from slot, copy it out as it is.

4. Fetch HeapTuple from slot, copy out each attribute separately, fetching
   and inlining any toasted datums.

In addition to the above, there are "direct" versions of codepaths 1 and 3,
used when the tuple fits in the caller-provided output buffer.

As discussed in https://github.com/greenplum-db/gpdb/issues/9253, the
fourth codepath is very inefficient, if the input tuple contains datums
that are compressed inline, but not toasted. We decompress such tuples
before serializing, and in the worst case, might need to recompress them
again in the receiver if it's written out to a table. I tried to fix that
in commit 4c7f6cf7, but it was broken and was reverted in commit
774613a8.

This is a new attempt at fixing the issue. This commit removes codepath 4.
altogether, so that if the input tuple is a HeapTuple with any toasted
attributes, it is first converted to a MemTuple and codepath 2 is used
to serialize it. That way, we have less code to test, and materializing a
MemTuple is roughly as fast as the old code to write out the attributes
of a HeapTuple one by one, except that the MemTuple codepath avoids the
decompression of already-compressed datums.

While we're at it, add some tests for the various codepaths through
SerializeTuple().

To test the performance of the affected case, where the input tuple is
a HeapTuple with toasted datums, I used this:

---
CREATE temporary TABLE foo (a text, b text, c text, d text, e text, f text,
  g text, h text, i text, j text, k text, l text, m text, n text, o text,
  p text, q text, r text, s text, t text, u text, v text, w text, x text,
  y text, z text, large text);
ALTER TABLE foo ALTER COLUMN large SET STORAGE external;
INSERT INTO foo
  SELECT 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
         'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
         repeat('1234567890', 1000)
  FROM generate_series(1, 10000);

-- verify that the data is uncompressed, should be about 110 MB.
SELECT pg_total_relation_size('foo');

\o /dev/null
\timing on
SELECT * FROM foo; -- repeat a few times
---

The last select took about 380 ms on my laptop, with or without this patch.
So the new codepath where the input HeapTuple is converted to a MemTuple
first, is about as fast as the old method. There might be small differences
in the serialized size of the tuple, too, but I didn't explicitly measure
that. If you have a toasted but not compressed datum, the input must be
quite large, so small differences in the datum header sizes shouldn't
matter much.

If the input HeapTuple contains any compressed datums, this avoids the
recompression, so even if converting to a MemTuple was somewhat slower in
that case, it should still be much better than before. I kept the
HeapTuple codepath for the case that there are no toasted datums. I'm not
sure it's significantly faster than converting to a MemTuple either; the
caller has to slot_deform_tuple() the received tuple before it can do
much with it, and that is slower with HeapTuples than MemTuples. But that
codepath is straightforward enough that getting rid of it wouldn't save
much code, and I don't feel like doing the performance testing to justify
it right now.
Reviewed-by: NAsim R P <apraveen@pivotal.io>

e2b63204

Remove unnecessary checks for NULL return from getChunkFromCache. · 7ffb084f

由 Heikki Linnakangas 提交于 12月 23, 2019

It cannot return NULL. It will either return a valid pointer, or the
palloc() will ERROR out.
Reviewed-by: NAsim R P <apraveen@pivotal.io>

7ffb084f

18 12月, 2019 1 次提交

Revert "When serializing a tuple for Motion, don't decompress compressed datums." · 6fbe87ad

由 Asim R P 提交于 12月 18, 2019

This reverts commit 788e3e7e.

Thank you Ekta for finding this simple repro that demonstrates the
problem with this patch and Jesse for initial analysis:

   CREATE TABLE foo(a text, b text);
   INSERT INTO foo SELECT repeat('123456789', 100000)::text as a,
                          repeat('123456789', 10)::text as b;
   SELECT * FROM foo;

The motion receiver has no idea whether a datum it received is
compressed or not, because the varlena header is stripped off before
sending the data.  Heikki and I discussed two options to fix this:

1. Include varlena header when sending.  This incurs at the most 8-byte
overhead per variable length datum in a heap tuple.

2. Always send tuples as MemTuples.  This is more desirable because it
simplifies code, but also comes with performance cost.

Let's evaluate the two options based on performance and then commit the
best one.

6fbe87ad

17 12月, 2019 1 次提交

When serializing a tuple for Motion, don't decompress compressed datums. · 788e3e7e

由 Heikki Linnakangas 提交于 12月 17, 2019

If a datum is toasted and compressed, only detoast it before serializing.
No need to decompress, the receiver can decompress it if needed. That's
a big win, if the receiver is just going to store the value back to disk,
and doesn't need to decompress it at all.

The corresponding codepath for MemTuples was already doing this, within
memtuple_form_to().

Addresses github issue https://github.com/greenplum-db/gpdb/issues/9253Reviewed-by: NAsim R P <apraveen@pivotal.io>

788e3e7e

03 12月, 2019 1 次提交

Check select return value before calling checkForCancelFromQD · 7a31b2c3

由 Hubert Zhang 提交于 10月 29, 2019

checkForCancelFromQD will call poll inside, which will override
the errno. This could lead that select error reports error with
misleading message: "interconnect error: select: Success"

7a31b2c3

26 11月, 2019 1 次提交

Use MVCC snapshot for gp_segment_configuration scan · 5c6f0e24

由 Ashwin Agrawal 提交于 11月 18, 2019

SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. As a result, getCdbComponentInfo() using
SnashotNow and AccessShareLock, may see duplicate entries for dbid, if
concurrent updates are performed mostly by FTS. Hence, instead use
MVCC snapshot similar to what's done in createdb() for pg_tablespace
scans.

With the change MVCC snapshot will be used all the times, when called
inside the transaction. Only if getCdbComponentInfo() is called
outside of transaction, which happens in phase 2 of 2PC. This happens
if COMMIT_PREPARED or ABORT_PREPARED fails, and dispather disconnects
and distroyies all gangs, and then do RETRY_COMMIT_PREPARED or
RETRY_ABORT_PREPARED. Since in phase 2 we have marked current
transaction state to TRANS_COMMIT/ABORT and MyProc->xmin is marked
with 0, can't acquire transaction snapshot. In 6X_STABLE and higher
versions this situation was avoided via commit
eb036ac1 but currently it seems
complicated to backport this change and hence continuing to use
SnapshotNow for this special case. For gp_segment_configuration we
perform sanity checks after scanning and can detect undesirable
result. If this continues to become problem we can in future, code
logic for FTS to write the contents and use the same for 5X_STABLE
too.
Reviewed-by: NDavid Kimura <dkimura@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NJimmy Yih <jyih@pivotal.io>
Reviewed-by: NZhenghua Lyu <kainwen@gmail.com>

5c6f0e24

25 11月, 2019 1 次提交

Don't pull up aggregate subquery with LIMIT. · 45b61aca

由 Heikki Linnakangas 提交于 11月 22, 2019

For completeness, also add a test for similar case with HAVING. The code
checked for that correctly, but we had no test coverage or it.

Fixes github issue https://github.com/greenplum-db/gpdb/issues/9095Reviewed-by: NRichard Guo <riguo@pivotal.io>

45b61aca

20 11月, 2019 1 次提交
- Z
  Remove redundant Assert in localXidSatisfiesAnyDistributedSnapshot. · f66037c0
  由 Zhenghua Lyu 提交于 11月 20, 2019
```
Right below we have an if-statement to check the same thing as
this assert.
```
  f66037c0
15 11月, 2019 3 次提交

Revert "Use MVCC snapshot for gp_segment_configuration scan" · cb2b74fa

由 Ashwin Agrawal 提交于 11月 14, 2019

This reverts commit 8de85665. The ICW
jobs are failing sometime with error "must be called before any
query". Need probably different way to grab the snapshot.

cb2b74fa

Use MVCC snapshot for gp_segment_configuration scan · 8de85665

由 Ashwin Agrawal 提交于 11月 14, 2019

SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. As a result, getCdbComponentInfo() using
SnashotNow and AccessShareLock, may see duplicate entries for dbid, if
concurrent updates are performed mostly by FTS. Hence, instead use
MVCC snapshot similar to what's done in createdb() for pg_tablespace
scans.

With the change MVCC snapshot will be used all the times, when called
inside the transaction. Only if getCdbComponentInfo() is called
outside of transaction, which happens in phase 2 of 2PC. This happens
if COMMIT_PREPARED or ABORT_PREPARED fails, and dispather disconnects
and distroyies all gangs, and then do RETRY_COMMIT_PREPARED or
RETRY_ABORT_PREPARED. Since in phase 2 we have marked current
transaction state to TRANS_COMMIT/ABORT and MyProc->xmin is marked
with 0, can't acquire transaction snapshot. In 6X_STABLE and higher
versions this situation was avoided via commit
eb036ac1 but currently it seems
complicated to backport this change and hence continuing to use
SnapshotNow for this special case. For gp_segment_configuration we
perform sanity checks after scanning and can detect undesirable
result. If this continues to become problem we can in future, code
logic for FTS to write the contents and use the same for 5X_STABLE
too.
Reviewed-by: NDavid Kimura <dkimura@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NJimmy Yih <jyih@pivotal.io>
Reviewed-by: NZhenghua Lyu <kainwen@gmail.com>

8de85665

Avoid memory-corruption during buildGangDefinition · 24ccc0c5

由 Ashwin Agrawal 提交于 11月 14, 2019

If code found more primaries in cdb_component_dbs compared to size of
the gang to be created, it would end up writing to incorrect memory
addresses beyond allocated memory. Hence, add protection for it and
ERROR out sooner instead of later for the same.
Reviewed-by: NDavid Kimura <dkimura@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NJimmy Yih <jyih@pivotal.io>
Reviewed-by: NZhenghua Lyu <kainwen@gmail.com>

24ccc0c5

05 11月, 2019 1 次提交

Empty CDB Segment IP cache on disconnect · 50015fed

由 Jason Vigil 提交于 10月 31, 2019

After a disconnect, the same cdbgang will never attempt to re-resolve
the hostname of a segment. If the segment IP changes, the cdbgang will
never be able to reconnect (unless a new session is initiated). By
emptying the segment IP cache, the cdbgang will lookup the new IP and be
able to reconnect to the segment.
Co-authored-by: NJason Vigil <jvigil@pivotal.io>
Co-authored-by: NGoutam Tadi <gtadi@pivotal.io>
Co-authored-by: NKaren Huddleston <khuddleston@pivotal.io>

50015fed

24 10月, 2019 1 次提交

Add missing break and fallthrough comment within switch-case (#8824) · 9dae1d96

由 Hao Wu 提交于 10月 24, 2019

GCC 7+ has a compiling option, -Wimplicit-fallthrough, which will
generate a warning/error if the code falls through cases(or default)
implicitly. The implicity may cause some bugs that are hardly to catch.

1. Append a comment line /* fallthrough */ or like at the end of case block.
2. Add break clause at the end of case block, if the last statement is
  ereport(ERROR) or like.
Reviewed-by: NAsim R P <apraveen@pivotal.io>

9dae1d96

16 10月, 2019 1 次提交

ic: tcp: init incoming conns before outgoing conns · 5b76f9b5

由 Ning Yu 提交于 10月 15, 2019

In SetupTCPInterconnect() we initialize both incoming and outgoing
connections, a state pointer sendingChunkTransportState is created to
track the status of outgoing connections, it is an entry of the states
array, we expect the pointer to be valid during the function.

However, after we get this pointer we will initialize the incoming
connections, they will resize the states array with repalloc(), so
sendingChunkTransportState will point to invalid memory and crash at
runtime.

To fix that we should initialize the incoming connections before the
outgoing ones, so the sendingChunkTransportState pointer stays valid
during its lifecycle.

Tests are not added as it has chances to be triggered by existing tests.

(cherry picked from commit 296dba82)

5b76f9b5

05 10月, 2019 1 次提交

Bump ORCA version to 3.74.0, Introduce PallocMemoryPool for use in GPORCA (#8747) · a3266308

由 Chris Hajas 提交于 10月 04, 2019

We introduce a new type of memory pool and memory pool manager:
CMemoryPoolPalloc and CMemoryPoolPallocManager

The motivation for this PR is to improve memory allocation/deallocation
performance when using GPDB allocators. Additionally, we would like to
use the GPDB memory allocators by default (change the default for
optimizer_use_gpdb_allocators to on), to prevent ORCA from crashing when
we run out of memory (OOM). However, with the current way of doing
things, doing so would add around 10 % performance overhead to ORCA.

CMemoryPoolPallocManager overrides the default CMemoryPoolManager in
ORCA, and instead creates a CMemoryPoolPalloc memory pool instead of a
CMemoryPoolTracker. In CMemoryPoolPalloc, we now call MemoryContextAlloc
and pfree instead of gp_malloc and gp_free, and we don’t do any memory
accounting.

So where does the performance improvement come from? Previously, we
would (essentially) pass in gp_malloc and gp_free to an underlying
allocation structure (which has been removed on the ORCA side). However,
we would add additional headers and overhead to maintain a list of all
of these allocations. When tearing down the memory pool, we would
iterate through the list of allocations and explicitly free each one. So
we would end up doing overhead on the ORCA side, AND the GPDB side.
However, the overhead on both sides was quite expensive!

If you want to compare against the previous implementation, see the
Allocate and Teardown functions in CMemoryPoolTracker.

With this PR, we improve optimization time by ~15% on average and up to
30-40% on some queries which are memory intensive.

This PR does remove memory accounting in ORCA. This was only enabled
when the optimizer_use_gpdb_allocators GUC was set. By setting
`optimizer_use_gpdb_allocators`, we still capture the memory used when
optimizing a query in ORCA, without the overhead of the memory
accounting framework.

Additionally, Add a top level ORCA context where new contexts are created

The OptimizerMemoryContext is initialized in InitPostgres(). For each
memory pool in ORCA, a new memory context is created in
OptimizerMemoryContext.
Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io>
Co-authored-by: NChris Hajas <chajas@pivotal.io>

a3266308

27 9月, 2019 1 次提交

Fix crash in COPY FROM if error happens · 1bbbcc09

由 Ashwin Agrawal 提交于 9月 26, 2019

If error happens in CopyFrom() it has defined cdbCopy=NULL, so when
PG_CATCH() calls COPY_HANDLE_ERROR it triggers PANIC. Hence, check for
null in cdbCopyEndAndFetchRejectNum().

The crash was exposed by following SQL commands:

CREATE TABLE public.heap01 (a int, b int) distributed by (a);
INSERT INTO public.heap01 VALUES (generate_series(0,99), generate_series(0,98));
ANALYZE public.heap01;

COPY (select * from pg_statistic where starelid = 'public.heap01'::regclass) TO '/tmp/heap01.stat';
DELETE FROM pg_statistic where starelid = 'public.heap01'::regclass;
COPY pg_statistic from '/tmp/heap01.stat';

Important note: Yes, it's known and strongly recommended to not touch
the `pg_statistics` or any other catalog table this way. But it's no
good to panic either. The copy to `pg_statictics` is going to ERROR
out "correctly" and not crash after this change with `cannot accept a
value of type anyarray`, as there just isn't any way at the SQL level
to insert data into pg_statistic's anyarray columns. Refer:
https://www.postgresql.org/message-id/12138.1277130186%40sss.pgh.pa.us

1bbbcc09

19 9月, 2019 1 次提交

Fix wrong results caused by NOT_EXISTS sublink elimination. · ff2582bb

由 Richard Guo 提交于 9月 19, 2019

In GPDB 5X, we're trying to eliminate NOT_EXISTS sublink if there is
'limit 0' in the subquery. To do that, we delete subquery's LIMIT and
build (n <= 0) expr to be ANDed into the parent qual. In this way, the
limit can be evaluated at run-time. However, in the case of when n is a
positive number (or NULL), the expr (n <= 0) would be evaluated as
false, which causes the whole parent qual to become false. As a result,
we will get wrong results for query below:

```
create table foo(a int);
insert into foo values (1);

create table bar(b int);

select * from foo where not exists
		(select 1 from bar where bar.b = foo.a limit 1);
```

This patch fixes the wrong results by evaluating the limit at plan-time
and returning true to be ANDed into the parent qual if n <= 0. If n is a
positive value (or NULL), however, LIMIT doesn't affect the semantics of
EXISTS, so this patch just ignores it.

This patch fixes github issue #8369 .
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

ff2582bb

28 8月, 2019 1 次提交

Do additional cleanup when setting udp interconnect fails to avoid potential panic. (#8430) · e8391480

由 Paul Guo 提交于 8月 21, 2019

We've seen occasional test failure of icudp/icudp_full due to an unexpected
panic of the QE process.  That happens when a QE main process elog(ERROR) in
SetupUDPIFCInterconnect_Internal() while its rx pthread is handling rx packets.
The memory (in memory context InterconnectContext) which is used to handle rx
packets is soon reset in resource owner ReleaseCallback function
destroy_interconnect_handle().

Fixing this by removing connection entries from hash table when
SetupUDPIFCInterconnect_Internal() errors out.

Cherry-picked from 10dba408

In addition, enable test DPICFullTestCases.test_icudp_full test.
According to the error message it is really unreasonable to disable
the test case.

e8391480

21 8月, 2019 1 次提交

Fix interconnect retransmission period · fd5af3ee

由 xiong-gang 提交于 8月 15, 2019

For a query like:
SELECT ... ORDER BY key LIMIT n;

When the query running time is longer than UNACK_QUEUE_RING_LENGTH * TIMER_SPAN
(By default is 10 seconds), and the first tuple send from OrderBy node to QD is
not acknowledged, function putIntoUnackQueueRing will calculate an
inappropriate retransmission period, which is UNACK_QUEUE_RING_LENGTH *
TIMER_SPAN.

fd5af3ee

16 8月, 2019 1 次提交

Fix parser algorithm for setting distribution key column numbers · 8e6af8a7

由 Denis Smirnov 提交于 8月 16, 2019

in gp_distribution_policy relation for inherited tables in GP5X
(GP6X is ok).
Query planes with GPORCA caused segmentation fault because of
out of range column numbers and postgres optimizer simply
returned error before current patch. For example:

create table ta (a int) distributed randomly;
create table tb (a int, b int) inherits (ta) distributed by (b);

set optimizer=on;
insert into tb values(0, 0);
-- Segmentation fault

set optimizer=off;
insert into tb values(0, 0);
-- ERROR: no tlist entry for key 3 (cdbmutate.c:1484)

select attrnums from gp_distribution_policy where localoid = 'tb'::regclass;
 attrnums
----------
{3}
(1 row)

Also a check for setting non-hashable distribution key from a
parent table in an inherited one didn't work.

create table tc (a point) distributed randomly;
create table td (b int) inherits (tc) distributed by (a);

select * from td;
ERROR:  could not find mergejoinable = operator for type 600 (pathkeys.c:1174)
Co-authored-by: NVasiliy Ivanov <7vasiliy@gmail.com>
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

8e6af8a7