提交 · 882958da5e73f6acec199884d85d479a1cf18283 · Greenplum / Gpdb

27 11月, 2018 9 次提交

Replace PathKey with new DistributionKey struct, in CdbPathLocus. · 882958da

由 Heikki Linnakangas 提交于 11月 27, 2018

In PostgreSQL, a PathKey represents sort ordering, but we have been using
it in GPDB to also represent the distribution keys of hash-distributed
data in the planner. (i.e. the keys in DISTRIBUTED BY of a table, but also
when data is redistributed by some other key on the fly). That's been
convenient, and there's some precedent for that, since PostgreSQL also
uses PathKey to represent GROUP BY columns, which is quite similar to
DISTRIBUTED BY.

However, there are some differences. The opfamily, strategy and nulls_first
fields in PathKey are not applicable to distribution keys. Using the same
struct to represent ordering and hash distribution is sometimes convenient,
for example when we need to test whether the sort order or grouping is
"compatible" with the distribution. But at other times, it's confusing.

To clarify that, introduce a new DistributionKey struct, to represent
a hashed distribution. While we're at it, simplify the representation of
HashedOJ locus types, by including a List of EquivalenceClasses in
DistributionKey, rather than just one EC like a PathKey has. CdbPathLocus
now has only one 'distkey' list that is used for both Hashed and HashedOJ
locus, and it's a list of DistributionKeys. Each DistributionKey in turn
can contain multiple EquivalenceClasses.

Looking ahead, I'm working on a patch to generalize the "cdbhash"
mechanism, so that we'd use the normal Postgres hash opclasses for
distribution keys, instead of hard-coding support for specific datatypes.
With that, the hash operator class or family will be an important part of
the distribution key, in addition to the datatype. The plan is to store
that also in DistributionKey.
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>

882958da

Fix crash when concurrent update a table contains varying length column · 8ae681e5

由 xiong-gang 提交于 11月 27, 2018

EvalPlanQual materializes the slot to a heap tuple, PRIVATE_tts_values point
to freed memory. We need to reset PRIVATE_tts_nvalid in ExecMaterializeSlot, to
prevent the following ExecFilterJunk from referencing the dangling pointer.

8ae681e5

Correct numsegments in reshuffle node · c868f3fe

由 Zhenghua Lyu 提交于 11月 27, 2018

Previously the reshuffle node's numsegments is always
set to the cluster size. Now we have flexible gang & dispath
API, we should correct the numsegments field of reshuffle
node to set it as the its lefttree's flow->numsegments.
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>

c868f3fe

Do not generate ReshuffleExpr for replicated table · e12d8cab

由 Zhenghua Lyu 提交于 11月 27, 2018

When we expand a partial replicated table via `alter
table t expand table`, internally we use the split-update
framework to implement the expansion. That framework is
designed for hash-distribtued tables at first. For replicated
table, we do not need the reshuffle_expr(filter condition) at
all because we need to transfer all data in a replicated table.

e12d8cab

pg_upgrade: freeze master data directory before copying to segments · 5583ecde

由 Kalen Krempely 提交于 11月 14, 2018

This allows the data to be visible on the segments. The segements
should not interpret any transaction id from master during or after
upgrade.
Co-authored-by: NAsim R P <apraveen@pivotal.io>
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>

5583ecde

Vacuum should operate on auxiliary tables in utility mode · 851779ac

由 Kalen Krempely 提交于 11月 14, 2018

Without this commit auxiliary tables such as toast and aoseg are skipped during
vacuum when run in utility mode (such as during pg_upgrade).
Co-authored-by: NAsim R P <apraveen@pivotal.io>
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>

851779ac

A

Run pgindent on zstd_compression.c. · d13a3089
由 Ashwin Agrawal 提交于 11月 26, 2018

d13a3089

Unit test for zstd_compression.c · 647cf58c

由 Ivan Leskin 提交于 11月 13, 2018

Add a unit test (and its infrastructure) for 'zstd_compress()'.

The test checks whether 'zstd_compress()' returns correct output in case
compression fails (compressed data is larger than uncompressed). To do that,
'ZSTD_compressCCtx()' is mocked to always return 'ZSTD_error_dstSize_tooSmall'.

Also, an 'ifndef' is added around 'PG_MODULE_MAGIC' in zstd_compression.c

647cf58c

Fix ZSTD compression error "dst buffer too small" · f8afec06

由 Ivan Leskin 提交于 11月 09, 2018

When ZSTD compression is used for AO CO tables, insertion of data may cause an
error "Destination buffer is too small". This happens when compressed data is
larger than uncompressed input data.

This commit adds handling of this situation: do not change output buffer and
return size used equal to source size. The caller (e.g.,
'cdbappendonlystoragewrite.c') is able to handle such output; in this case, it
copies data from input to output itself.

f8afec06

26 11月, 2018 4 次提交

Always create fully distributed tables · e90a3bdd

由 Ning Yu 提交于 11月 26, 2018

In CREATE TABLE we used to decide numsegments from LIKE, INHERITS and
DISTRIBUTED BY clauses. However we do not want partially distributed
tables to be created by end users, so change the logic to always create
tables with DEFAULT as numsegments. We still allow developers to hack
the DEFAULT numsegments with the gp_debug_numsegments extension.

e90a3bdd

The showtype argument of get_const_expr() is no longer bool. (#6304) · bd38461c

由 Paul Guo 提交于 11月 26, 2018

PG patch 59358907 changed that but somehow this code line change is lost in gpdb during
merge.

Reviewed-by: Heikki Linnakangas

bd38461c

Corresponding test changes to function renaming · 586249a1

由 Adam Lee 提交于 11月 26, 2018

"c2edc32d Silence compiler warnings
about unused functions, with OpenSSL 1.1.0." renamed functions, this is
the corresponding test changes.

586249a1

Remove unused arguments to query · 7cf749b4

由 Daniel Gustafsson 提交于 11月 25, 2018

Commit 226e8867 removed oidcasted_pk
and max_content from the SQL query, but didn't remove the arguments.
While they don't cause an issue as they will be unused, remove to
avoid confusing readers.

Reviewed-by: Heikki Linnakangas

7cf749b4

25 11月, 2018 3 次提交

Remove /static/ from PostgreSQL doc links · ac753aed

由 Daniel Gustafsson 提交于 11月 24, 2018

Commit 17f9b7f070dbe17b2844a8b4dd428 in the pgweb repository removed
the /static/ portion on all doc URLs, leaving a redirect in place. To
avoid incurring a needless redirect, remove the /static/ part in the
links to the PostgreSQL documentation.

The /static/ URLs stem from a time when there were interactive docs
that had comment functionality. These docs were removed a very long
time ago, but the static differentiator was left in place until now.

Reviewed-by: Mel Kiyama

ac753aed

Silence compiler warnings about unused functions, with OpenSSL 1.1.0. · c2edc32d

由 Heikki Linnakangas 提交于 11月 24, 2018

With OpenSSL 1.1.0 and above, CRYPTO_set_id_callback and
CRYPTO_set_locking_callback are no-op macros, which rendered
id_function() and locking_function() unused. That produced compiler
warnings.
Reviewed-by: NPaul Guo <pguo@pivotal.io>
Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>

c2edc32d

Silence compiler's suggestions about marking functions with printf-attribute. · 482ae41b

由 Heikki Linnakangas 提交于 11月 24, 2018

I was getting these compiler warnings:

src/s3log.cpp: In function ‘void _LogMessage(const char*, __va_list_tag*)’:
src/s3log.cpp:17:42: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
     vsnprintf(buf, sizeof(buf), fmt, args);
                                          ^
src/s3log.cpp: In function ‘void _send_to_remote(const char*, __va_list_tag*)’:
src/s3log.cpp:27:55: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
     size_t len = vsnprintf(buf, sizeof(buf), fmt, args);
                                                       ^
src/s3log.cpp: In function ‘void LogMessage(LOGLEVEL, const char*, ...)’:
src/s3log.cpp:41:39: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
             vfprintf(stderr, fmt, args);
                                       ^

Those are good suggestions. I couldn't figure out the correct way to mark
the _LogMessage() and _send_to_remote() local functions, so I decided to
inline them into the caller, LogMessage(), instead. They were almost
one-liners, and LogMessage() is still very small, too, so I don't think
there's any significant loss to readability.

A few format strings in debugging messages were treating pthread_self() as
a pointer, while others were treating it as a wrong kind of integer.
Harmonize by casting it to "uint64_t", and using PRIX64 as the format
string. This isn't totally portable: pthread_t can be an arithmetic type,
or a struct, and casting a struct to unsigned int won't work. In principle,
that was a problem before this patch already, but now you should get a
compiler error, if you try to compile on a platform where pthread_t is not
an arithmetic type. I think that's better than silent type confusion.
Reviewed-by: NPaul Guo <pguo@pivotal.io>
Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>

482ae41b

24 11月, 2018 1 次提交

Improve error, on ALTER TABLE SET DISTRIBUTED BY on non-hashable type. · 0a7ea4ea

由 Heikki Linnakangas 提交于 11月 24, 2018

Old error:

ERROR:  cannot use expression as distribution key, because it is not hashable (cdbmutate.c:1329)

The new error is the same you get with CREATE TABLE. While we're at it,
also change "can't" contraction to "cannot" in the error message, to
follow the PostgreSQL error message guidelines.

0a7ea4ea

23 11月, 2018 9 次提交

Revert `Remove Master/Standby SyncRepWait Greenplum hack` and two other relevent commit (#6307) · b8f98fca

由 BaiShaoqi 提交于 11月 23, 2018

Here revert four commits

1. Remove Master/Standby SyncRepWait Greenplum hack: 7f6066ea
2. Add alter system synchronous_standby_names to * when gpinitstandby -n 1136f2fb
3. Hot fix in gpinitstandby behave test failure: b6c77b2f
4. Remove unused variables, to silence compiler warnings 88a185a5

The first commit should revert, because gpinitstandby did not change correctly comform to the this commit.
It will cause If the standby is down, and the synchronous_standby_names is *, the cluster will not start and hang on

The second, third, fourth should revert, because it is relavent to the first commit.

b8f98fca

H

Remove unnecessary #includes. · f30355fa
由 Heikki Linnakangas 提交于 11月 23, 2018

f30355fa
H

Remove a few unused functions from ORCA's gpdbwrappers. · 3e18f878
由 Heikki Linnakangas 提交于 11月 23, 2018

3e18f878
S
If the standby is down, and the synchronous_standby_names is *, the cluster will not start · b6c77b2f
由 Shaoqi Bai 提交于 11月 23, 2018
```
Co-authored-by: NNing Yu <nyu@pivotal.io>
```
b6c77b2f

Fix a bug of replicated table · 0e461e16

由 Pengzhou Tang 提交于 11月 06, 2018

Previously, when creating join path between CdbLocusType_SingleQE path
and CdbLocusType_SegmentGeneral path, we always add a motion on top
of CdbLocusType_SegmentGeneral path so even the join path is promoted
to executed on QD, the CdbLocusType_SegmentGeneral path can still be
executed to segments.
                     join (CdbLocusType_SingleQE)
					/    \
                   /      \
CdbLocusType_SingleQE     Gather Motion
                            \
                          CdbLocusType_SegmentGeneral

For example:
(select * from partitioned_table limit 1) as t1
Nested Loop
    ->  Gather Motion 1:1
	     ->  Seq Scan on replicated_table
    ->  Materialize
		 ->  Subquery Scan on t1
		    ->  Limit
			   ->  Gather Motion 3:1
	               ->  Limit
		               ->  Seq Scan on partitioned_table
replicated_table only store tuples on segments, so without
the gather motion, seq scan of replicated_table doesn't
provide tuples.

There is another problem, if join path is not promoted to
QD, the gather motion might be redundant, For example:

  (select * from replicated_table, (select * from
  partitioned_table limit 1) t1) sub1;

Gather Motion 3:1
  -> Nested Loop
      ->  Seq Scan on partitioned_table_2
      ->  Materialize
          ->  Broadcast Motion 1:3
              -> Nested Loop
                 ->  Gather Motion 1:1 (redundant motion)
	                 ->  Seq Scan on replicated_table
              ->  Materialize
		         ->  Subquery Scan on t1
		            ->  Limit
			             ->  Gather Motion 3:1
	                       ->  Limit
		                      ->  Seq Scan on partitioned_table

So in apply_motion_mutator(), we omit such redundant motion if
it's not gathered to top slice (QD). sliceDepth == 0 means it
is top slice, however, sliceDepth now is shared by both init
plans and main plan, so if main plan increased the sliceDepth,
init plan may omit the gather motion unexpectedly which create
a wrong results.

The fix is simple to reset sliceDepth for init plans

0e461e16

P

update test with new syntax · 64cdd47c
由 Pengzhou Tang 提交于 11月 22, 2018

64cdd47c

Implement EXPAND syntax · cfe3f386

由 Pengzhou Tang 提交于 11月 15, 2018

Implement "ALTER TABLE table EXPAND TABLE" to expand tables.

"Expanding" and "Set Distributed by" are actually two different kind of
operations on tables, old gpexpand used to use "Set Distributed by" to
expand tables for historical reasons and our early version of expand
were also squashed into "Set Distributed by", this make code hard to
hard to understand and concept confused.

This commit divide "Expanding" and "Set Distributed by" totally and
implement "Expanding" with new syntax. We have two method to implement
data movement, one is CTAS, another is RESHUFFLE, depend on how much
data need to move. If tuples to move is less than 10000, choose
RESHUFFLE, or if scale to move is less than 30% choose RESHUFFLE,
otherwise, choose CTAS

For partition table, we disallow expand leaf partition seperately because
root partition cannot has different numsegments with leaf partitions,
SELECT/UPDATE should be fine if numsegments is inconsistent, however,
INSERT will make trouble that data are inserted to unexpected place.

The new syntax is supposed to only used by gpexpand and not be exposed
to normal users, so no need to update document.

cfe3f386

Fix numsegments when appending multiple SingleQEs · fa86f160

由 Ning Yu 提交于 11月 23, 2018

When Append node contains SingleQE subpath we used to put Append on ALL
the segments, however if the SingleQE is partially distributed then
apparently we could not put the SingleQE on ALL the segments, this
conflict could results in runtime or incorrect results.

To fix this we should put Append on SingleQE's segments.

In the other hand when there are multiple SingleQE subpaths we should
put Append on the common segments of SingleQEs.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

fa86f160

Reduce differences between reshuffle tests · 2eef2ba2

由 Ning Yu 提交于 11月 23, 2018

There are 3 reshuffle tests, the ao one, the co one, and the heap one.
They share almost the same cases, but different on table names and
create table options.  There are also some differences caused when
adding regression tests, they are only added in one file but not others.

We want to keep minimal differences between these tests, so we ensure
that a regression test for ao also covers similar case for heap.  And
once we understand one of the test file we have almost the same
knowledge on the others.

Here is a list of changes to these tests:
- reduce differences on table names by using schema;
- reduce differences on CREATE TABLE options by setting default storage
  options;
- simplify the creation of partially distributed tables by using the
  gp_debug_numsegments extension;
- copy some regression tests to all the tests;
- retire the no longer used helper function;
- move the tests into an existing parallel test group;

pg_regress test framework provides some @@ tokens for ao/co tests,
however we still can not merge the ao and co tests into one file as
WITH (OIDS) is only supported by ao but not co.

2eef2ba2

22 11月, 2018 13 次提交

Pick a smarter Hashed locus for LEFT and RIGHT JOINs. · 3d6c78c9

由 Heikki Linnakangas 提交于 11月 22, 2018

When determining the locus for a LEFT or RIGHT JOIN, we can use the outer
side's distribution key as is. The EquivalenceClasses from the nullable
side are not of interest above the join, and the outer side's distribution
key can lead to better plans, because it can be made a Hashed locus,
rather than HashedOJ. A Hashed locus can be used for grouping, for
example, unlike a HashedOJ.

This buys back better plans for some INSERT and CTAS queries, that started
to need Redistribute Motions after the previous commit.
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>

3d6c78c9

Fix confusion with distribution keys of queries with FULL JOINs. · a25e2cd6

由 Heikki Linnakangas 提交于 11月 22, 2018

There was some confusion on how NULLs are distributed, when CdbPathLocus
is of Hashed or HashedOJ type. The comment in cdbpathlocus.h suggested
that NULLs can be on any segment. But the rest of the code assumed that
that's true only for HashedOJ, and that for Hashed, all NULLs are stored
on a particular segment. There was a comment in cdbgroup.c that said "Or
would HashedOJ ok, too?"; the answer to that is "No!". Given the comment
in cdbpathlocus.h, I'm not suprised that the author was not very sure
about that. Clarify the comments in cdbpathlocus.h and cdbgroup.c on that.

There were a few cases where we got that actively wrong. repartitionPlan()
function is used to inject a Redistribute Motion into queries used for
CREATE TABLE AS and INSERT, if the "current" locus didn't match the target
table's policy. It did not check for HashedOJ. Because of that, if the
query contained FULL JOINs, NULL values might end up on all segments. Code
elsewhere, particularly in cdbgroup.c, assumes that all NULLs in a table
are stored on a single segment, identified by the cdbhash value of a NULL
datum. Fix that, by adding a check for HashedOJ in repartitionPlan(), and
forcing a Redistribute Motion.

CREATE TABLE AS had a similar problem, in the code to decide which
distribution key to use, if the user didn't specify DISTRIBUTED BY
explicitly. The default behaviour is to choose a distribution key that
matches the distribution of the query, so that we can avoid adding an
extra Redistribute Motion. After fixing repartitionPlan, there was no
correctness problem, but if we chose the key based on a HashedOJ locus,
there is no performance benefit because we'd need a Redistribute Motion
anyway. So modify the code that chooses the CTAS distribution key to
ignore HashedOJ.

While we're at it, refactor the code to choose the CTAS distribution key,
by moving it to a separate function. It had become ridiculously deeply
indented.

Fixes https://github.com/greenplum-db/gpdb/issues/6154, and adds tests.
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>

a25e2cd6

Add tests for deriving distribution keys from query in CREATE TABLE AS. · 9457fe71

由 Heikki Linnakangas 提交于 11月 22, 2018

The case where some, but not all, of the query's distribution keys were
present in the result set, was not covered by any existing tests.

Per Paul Guo's observation.

9457fe71

Cosmetic fixes in the code to determine distribution key for CTAS. · a5fa3110

由 Heikki Linnakangas 提交于 11月 22, 2018

Fix indentation. In the code to generate a NOTICE, remove if() for
condition that we had checked earlier in the function already, and use a
StringInfo for building the string.

a5fa3110

H
Remove unused variables, to silence compiler warnings. · 88a185a5
由 Heikki Linnakangas 提交于 11月 22, 2018
```
These were left behind by commit 7f6066ea.
```
88a185a5

Fix obsolete comment in cdb_build_distribution_pathkeys(). · 8c9c0576

由 Heikki Linnakangas 提交于 11月 22, 2018

It returns a simple list of PathKeys, not a list of lists. The code was
changed in the 8.3-era merge of equivalence classes already, but we
neglected the comment.

8c9c0576

Remove dead code in ATExecSetDistributedBy · 50f2e3bb

由 Zhenghua Lyu 提交于 11月 22, 2018

This commit is the first step to refactor ATExecSetDistributedBy. Its
main purpose is to remove some dead code in this function and during
the process we find some helper functions can also be simplified so
the simplification is also in this commit.

According to MPP-7770, we should disable changing storage options for now.
It is ugly to just throw an error when encounter `appendonly` option but
without removing the code. In this commit remove all related logic.

Because of with clause can only contain reshuffle|reorganize, we only
new_rel_opts if the table itself is ao|aoco. No need to deduce it from with
clause.

We also remove the unnecessary checks at the start of this function. Because
These checks have been already done in the function `ATPrepCmd`.

Co-authored-by: Shuejie Zhang <shzhang@pivotal.io >

50f2e3bb

After applying that commit 22e04dc12df9e0577ba93a75dbef160c8c1ed258, the... · 1136f2fb

由 Shaoqi Bai 提交于 11月 20, 2018

After applying that commit 22e04dc12df9e0577ba93a75dbef160c8c1ed258, the master will block when the standby master is down.

There are a couple things that need to be done to unblock the master.
1. Run gpinitstandby -n to start the standby master back up.
2. Run psql postgres -c "ALTER SYSTEM SET synchronous_standby_names = '';" and reload the master segment.

Note that the ALTER SYSTEM SET has to be called again to set synchronous_standby_names back to '*' (and master config reloaded) to enable synchronous replication again.
Thoughts are to make it 1 step combined into gpinitstandby -n instead of documenting a multi-step process.

What this commit is just to make it 1 step combined into gpinitstandby -n.
Co-authored-by: NNing Yu <nyu@pivotal.io>

1136f2fb

Remove Master/Standby SyncRepWait Greenplum hack · 7f6066ea

由 Jimmy Yih 提交于 11月 12, 2018

When the standby master is unavailable, the master will not block on commits
even though we enable synchronous replication. This is because we have a
Greenplum hack which checks if the WAL stream with the standby master is
valid. If the stream is invalid, the master will quickly skip the SyncRepWait
and continue on its commit.

Remove this hack in order to make Master/Standby and Primary/Mirror WAL
replication more similar.
Co-authored-by: NShaoqi Bai <sbai@pivotal.io>

7f6066ea

Use max numsegments of subpaths for Append node · 1b2f7bcd

由 Ning Yu 提交于 11月 22, 2018

Suppose t1 has numsegments=1 and t2 has numsegments=2, then below query
will have incorrect plan:

    explain (costs off) select * from t2 a join t2 b using(c2)
              union all select * from t1 c join t1 d using(c2);
                                   QUERY PLAN
    ------------------------------------------------------------------------
     Gather Motion 1:1  (slice3; segments: 1)
       ->  Append
             ->  Hash Join
                   Hash Cond: (a.c2 = b.c2)
                   ->  Redistribute Motion 2:2  (slice1; segments: 2)
                         Hash Key: a.c2
                         ->  Seq Scan on t2 a
                   ->  Hash
                         ->  Redistribute Motion 2:2  (slice2; segments: 2)
                               Hash Key: b.c2
                               ->  Seq Scan on t2 b
             ->  Hash Join
                   Hash Cond: (c.c2 = d.c2)
                   ->  Seq Scan on t1 c
                   ->  Hash
                         ->  Seq Scan on t1 d
     Optimizer: legacy query optimizer
    (17 rows)

slice2 has a 2:2 redistribute motion to slice3, however slice3 only has
1 segment, this is due to Append's numsegments is decided from the last
subpath.

To fix the issue we should use max numsegments of subpaths for Append.

The issue was already fixed in 39856768,
we are only adding tests for it now.

1b2f7bcd

New extension to debug partially distributed tables · 3119009a

由 Ning Yu 提交于 11月 22, 2018

Introduced a new debugging extension gp_debug_numsegments to get / set
the default numsegments when creating tables.

gp_debug_get_create_table_default_numsegments() gets the default
numsegments.

gp_debug_set_create_table_default_numsegments(text) sets the default
numsegments in text format, valid values are:
- 'FULL': all the segments;
- 'RANDOM': pick a random set of segments each time;
- 'MINIMAL': the minimal set of segments;

gp_debug_set_create_table_default_numsegments(integer) sets the default
numsegments directly, valid range is [1, gp_num_contents_in_cluster].

gp_debug_reset_create_table_default_numsegments(text) or
gp_debug_reset_create_table_default_numsegments(integer) reset the
default numsegments to the specified value, and the value can be reused
later.

gp_debug_reset_create_table_default_numsegments() resets the default
numsegments to the value passed last time, if there is no previous call
to it the value is 'FULL'.

Refactored ICG test partial_table.sql to create partial tables with this
extension.

3119009a

Fix dynahash HASH_ENTER usage · e576c0b9

由 Daniel Gustafsson 提交于 11月 21, 2018

rel_partitioning_is_uniform() and addMCVToHashTable() inserted with
HASH_ENTER, and subsequently checked the returnvalue for NULL in
order to error out on "out of memory". HASH_ENTER however doesn't
return if it couldn't insert and will error out itself so remove the
test as it cannot happen.

groupHashNew() was using HASH_ENTER_NULL which does return NULL in
out of memory situations, but it failed to correctly handle the
returnvalue and dereferenced without check risking a null pointer
deref under memory pressure. Fix by using HASH_ENTER instead as
the code clearly expect that behavior.
Reviewed-by: NPaul Guo <paulguo@gmail.com>

e576c0b9

Another attempt at fixing Assertion on empty AO tables. · 77ac9bdf

由 Heikki Linnakangas 提交于 11月 21, 2018

Commit cc2e211f attempted to silence the assertion in
vac_update_relstats(), but the assertion it in turn added, was hit
heavily. vacuum_appendonly_fill_stats() function, where I added the check
for zero pages and non-zero tuples combination, is also reached in QD mode,
contrary to the comments and the assertion that I added. I'm not sure why
we look at the totals in QD mode - AFAICS we just throw away them away -
but I'm reluctant to start restructuring this code right now. So move the
code to zap reltuples to 0, into vac_update_relstats().
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

77ac9bdf

21 11月, 2018 1 次提交

Set statistics to zero when no sampled rows · fb537b53

由 Daniel Gustafsson 提交于 11月 21, 2018

In the unlikely event that we reach this codepath with a samplerows
value of zero (which albeit unlikely could happen), avoid performing
a division by zero and instead set the null fraction to zero as we
clearly don't have any more information to go on. The HLL code calls
calls the compute_stats function pointer with zero samplerows, and
while that's using a different compute_stats function, it's an easy
mistake to make when not all functions can handle a division by zero.
This is defensive programming prompted by a report that triggered an
old bug like this without actually hitting this, but there is little
reason to take the risk of a crash. Suspenders go well with belts.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

fb537b53