提交 · 05929d4c1a7b0218f80aa19125990bef655a09ae · Greenplum / Gpdb

25 1月, 2019 1 次提交

Remove GPDB_94_STABLE_MERGE_FIXME for CTE · 05929d4c

由 Ekta Khanna 提交于 1月 22, 2019

Here the check for `RTE_CTE` is correct since in GPDB, while planning, CTE is
considered similar to subquery. Hence, removing the FIXME.
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>

05929d4c

24 1月, 2019 18 次提交

Tidy up error reporting in gp_sparse_vector a little · 9208b8f3

由 Daniel Gustafsson 提交于 1月 24, 2019

This cleans up the error messages in the sparse vector code a little
by ensuring they mostly conform to the style guide for error handling.
Also fixes a nearby typo and removes commented out elogs which are
clearly dead code.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

9208b8f3

Fix up incorrect license statements for module · 41e8ed17

由 Daniel Gustafsson 提交于 1月 24, 2019

The gp_sparse_vector module was covered by the relicensing done as
part of the Greenplum open sourcing, but a few mentions of previous
licensing remained in the code. The legal situation of this code has
been reviewed by Pivotal legal and is cleared, so remove incorrect
statements and replace with the standard copyright file headers.

This also cleans up a few comments while at it.

Reviewed-by: Cyrus Wadia
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

41e8ed17

Remove unused header file · 0c1add31

由 Daniel Gustafsson 提交于 1月 24, 2019

The float_specials.h header was removed shortly after this contrib
module was imported in 2010, and has been dead code since. Remove.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

0c1add31

Remove unused and inline single-use functions · f1277a5c

由 Daniel Gustafsson 提交于 1月 24, 2019

This removes a few unused functions, and inlines the function body of
another one which only had a single caller. Also properly mark a
few functions as static.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

f1277a5c

Stabilize gp_sparse_vector test · ccfe82b7

由 Daniel Gustafsson 提交于 1月 24, 2019

Remove redundant test on array_agg which didn't have a stable output, and
remove an ORDER BY to let atmsort deal with differences instead.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

ccfe82b7

Allocate histogram with palloc to avoid memleak · 0d4908ee

由 Daniel Gustafsson 提交于 1月 24, 2019

The histogram structure was allocated statically via malloc(), but it
had no data retention between calls as it was purely a microoptimization
to avoid the cost of repeated allocations. This lead to the allocated
memory leaking as it's not cleaned up automatically. Fix by pallocing
the memory instead and take the cost of repeat allocation.

Also ensure to properly clean up allocated memory on failure cases.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

0d4908ee

D
Set correct subdir path for new GP contrib · d613a759
由 Daniel Gustafsson 提交于 1月 24, 2019
```
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
```
d613a759

Fix memory management in gp_sparse_vector · 970a0395

由 Daniel Gustafsson 提交于 1月 24, 2019

palloc() is guaranteed to only return on successful allocation, so there
is no need to check it. ereport(ERROR..) is guaranteed never to return,
and to clean up on it's way out, so pfree()ing after an ereport() is not
just unreachable code, it would be a double-free if it was reached.

Also add proper checks on the malloc() and strdup() calls as those are
subject to the usual memory pressure controls by the programmer.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

970a0395

A
pg_dump: free temporary variable qualTmpExtTable · 68b11d46
由 Adam Lee 提交于 1月 24, 2019
```
This part of codes are not covered by PR pipeline, tested manually.
```
68b11d46
A
pg_dump: fix dropping temp external table failure on CI · ed940ab6
由 Adam Lee 提交于 1月 24, 2019
```
fmtQualifiedId() and fmtId() share the same buffer, we cannot
use any of them until we finished calling.
```
ed940ab6

Don't choose indexscan when you need a motion for a subplan (#6665) · cd055f99

由 Melanie 提交于 1月 23, 2019


When you have subquery under a SUBLINK that might get pulled up, you should
not allow indexscans to be chosen for the relation which is the
rangetable for the subquery. If that relation is distributed and the
subquery is pulled up, you will need to redistribute or broadcast that
relation and materialize it on the segments, and cdbparallelize will not
add a motion and materialize an indexscan, so you cannot use indexscan
in these cases.
You can't materialize an indexscan because it will materialize only one
tuple at a time and when you compare that to the param you get from the
relation on the segments, you can get wrong results.

Because we don't pick indexscan very often, we don't see this issue very
often. You need a subquery referring to a distributed table in a subplan
which, during planning, gets pulled up and then when adding paths, the
indexscan is cheapest.
Co-authored-by: NAdam Berlin <aberlin@pivotal.io>
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>

cd055f99

Remove stale replication slots on mirrors. · fa09dd80

由 David Kimura 提交于 1月 17, 2019

Stale replication slots can exist on mirrors that were once acting as
primaries. In this case restart_lsn is non-zero value used in past
replication slot setup. The stale replication slot will continue to
retain xlog on mirror which is problematic and unnecessary.

This patch drops internal replication slot on startup of mirror.
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>

fa09dd80

Prevent int128 from requiring more than MAXALIGN alignment. · d74dd56a

由 Jesse Zhang 提交于 1月 22, 2019

We backported 128-bit integer support to speed up aggregates (commits
8122e143 and 959277a4) from upstream 9.6 into Greenplum (in
commits 9b164486 and 325e6fcd). However, we forgot to also port a
follow-up fix postgres/postgres@7518049980b, mostly because it's nuanced
and hard to reproduce.

There are two ways to tell the brokenness:

1. On a lucky day, tests would fail on my workstation, but not my laptop (or
   vice versa).

1. If you stare at the generated code for `int8_avg_combine` (and friends),
   you'll notice the compiler uses "aligned" instructions like `movaps` and
   `movdqa` (on AMD64).

Today's my lucky day.

Original commit message from postgres/postgres@7518049980b (by Tom Lane):

> Our initial work with int128 neglected alignment considerations, an
> oversight that came back to bite us in bug #14897 from Vincent Lachenal.
> It is unsurprising that int128 might have a 16-byte alignment requirement;
> what's slightly more surprising is that even notoriously lax Intel chips
> sometimes enforce that.

> Raising MAXALIGN seems out of the question: the costs in wasted disk and
> memory space would be significant, and there would also be an on-disk
> compatibility break.  Nor does it seem very practical to try to allow some
> data structures to have more-than-MAXALIGN alignment requirement, as we'd
> have to push knowledge of that throughout various code that copies data
> structures around.

> The only way out of the box is to make type int128 conform to the system's
> alignment assumptions.  Fortunately, gcc supports that via its
> __attribute__(aligned()) pragma; and since we don't currently support
> int128 on non-gcc-workalike compilers, we shouldn't be losing any platform
> support this way.

> Although we could have just done pg_attribute_aligned(MAXIMUM_ALIGNOF) and
> called it a day, I did a little bit of extra work to make the code more
> portable than that: it will also support int128 on compilers without
> __attribute__(aligned()), if the native alignment of their 128-bit-int
> type is no more than that of int64.

> Add a regression test case that exercises the one known instance of the
> problem, in parallel aggregation over a bigint column.

> This will need to be back-patched, along with the preparatory commit
> 91aec93e.  But let's see what the buildfarm makes of it first.

> Discussion: https://postgr.es/m/20171110185747.31519.28038@wrigleys.postgresql.org

(cherry picked from commit 75180499)

d74dd56a

Rearrange c.h to create a "compiler characteristics" section. · 60a08bc2

由 Jesse Zhang 提交于 1月 22, 2019

This cherry-picks 91aec93e. We had to be extra careful to preserve
still-in-use macros UnusedArg and STATIC_IF_INLINE and friends.

> Generalize section 1 to handle stuff that is principally about the
> compiler (not libraries), such as attributes, and collect stuff there
> that had been dropped into various other parts of c.h.  Also, push
> all the gettext macros into section 8, so that section 0 is really
> just inclusions rather than inclusions and random other stuff.

> The primary goal here is to get pg_attribute_aligned() defined before
> section 3, so that we can use it with int128.  But this seems like good
> cleanup anyway.

> This patch just moves macro definitions around, and shouldn't result
> in any changes in generated code.  But I'll push it out separately
> to see if the buildfarm agrees.

> Discussion: https://postgr.es/m/20171110185747.31519.28038@wrigleys.postgresql.org

(cherry picked from commit 91aec93e)

60a08bc2

Update GDD to not assign global transaction ids · e24ddd70

由 David Kimura 提交于 1月 18, 2019

Currently GDD sets DistributedTransactionContext to
DTX_CONTEXT_QD_DISTRIBUTED_CAPABLE and as a result allocates distributed
transaction id. It creates entry in ProcGlobal->allTmGxact with state
DTX_STATE_ACTIVE_NOT_DISTRIBUTED. The effect of this is that any query
taking a snapshot will see this transaction as in progress. Since GDD
transaction is short lived it is not an issue in general, but in CI it
causes flaky behavior for some of the vacuum tests. The flaky behavior
shows up as unvacuumed tables where the vacuum snapshot was taken while
GDD transaction was running thereby forcing vacuum to lower its oldest
XMIN. Current behavior of GDD consuming a distributed transaction id
(every 2 minutes by default) is also wasteful behavior.

Currently GDD also sends a snapshot to QE, but this isn't required and
is wasteful as well.

In this change for GDD we keep DistributedTransactionContext as
DTX_CONTEXT_LOCAL_ONLY and avoid dispatching snapshots to QEs.
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>

e24ddd70

Gpexpand use gp_add_segment to register primaries. · e2c699c8

由 Ashwin Agrawal 提交于 1月 23, 2019

Currently, dbid is used in tablespace path. Hence, while creating
segment need dbid. To get the dbid need to add segment to catalog
first. But adding segment to catalog before creating causes
issues. Hence, modify gpexpand to not let database generate the dbid,
but instead pass the dbid upfront generated while registering in
catalog. This way dbid used while creating the segment will be same as
dbid in catalog.
Reviewed-by: NJimmy Yih <jyih@pivotal.io>

e2c699c8

L

Add libzstd-devel to docker images (#6787) · bcdcb827
由 Lav Jain 提交于 1月 23, 2019

bcdcb827

Explicitly pass 0 as number of dead tuples to pgstat when vacuuming AO tables. · 148d718d

由 Georgios Kokolatos 提交于 1月 23, 2019

An argument can be made that hidden tuples in AO tables are similar to dead tuples
for regular tables. However, the use of this information with regards to pgstats
seems to be semantically distinct and consequently should not be exposed. As example
after a VACUUM (FULL, ANALYZE) of an AO table, hidden tuples will remain if AO
compaction thresholds are not met.

It seems preferable to explicitly pass 0 instead of the already zero'd LVRelStats
member for clarity.
Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

148d718d

23 1月, 2019 15 次提交

Fix bug: zombie record in gp_distribution_policy (#6768) · b949ac96

由 Jialun 提交于 1月 23, 2019

When a table has been transformed to a view by creating ON SELECT
rule, the record in gp_distribution_policy should be deleted also,
for there is no such record for a view.
Also, the relstorage in pg_class should be changed to 'v'.

b949ac96

Add libzstd to CentOS dependencies README · 78038632

由 Dmitriy Dubson 提交于 1月 22, 2019

Missing documentation on newly required `libzstd` dependency.
Reviewed-by: NJimmy Yih <jyih@pivotal.io>
Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>

78038632

gp_toolkit.gp_skew_* should support replicated table correctly · 6e862c90

由 Pengzhou Tang 提交于 1月 11, 2019

gp_toolkit.gp_skew_* series views/functions are used to query how data
is skewed in database. The idea is using a query like:
"select gp_segment_id, count(*) cnt from foo group by gp_segment_id",
and compare the cnt by gp_segment_id.

For the replicated table, only one replica is picked to count the tuple
number by the planner, so the old calculate logic produced a confusing
result that a replicated table is skewed which is not expected:

gpadmin=# select * From gp_toolkit.gp_skew_idle_fractions;
 sifoid | sifnamespace | sifrelname |      siffraction
 --------+--------------+------------+------------------------
   16385 | public       | rpt        | 0.66666666666666666667

What's more, gp_segment_id is ambiguous for replicated table, so in
commit b120194a, we disallow user to access system columns include
gp_segment_id, so gp_toolkit.gp_skew_* views now report an error
now.

This commit correct the results of gp_toolkit.gp_skew_*
views/functions for the replicated table although the results are
pointless, however, this way should be more friendly for users.

6e862c90

P
Remove the obsolete comment for RETURNING and put the test in a parallel... · 00daeffe
由 Paul Guo 提交于 1月 21, 2019
```
Remove the obsolete comment for RETURNING and put the test in a parallel running group, following pg upstream.
```
00daeffe

Run gp_toolkit early to reduce testing time of it due to less logs. · 5bc0bcb2

由 Paul Guo 提交于 1月 21, 2019

gp_toolkit test tests various log related views like gp_log_system(), etc. If
we run the test earlier, less logs are generated and thus the test runs fater.
In my test environment, the test time reduces from ~22 seconds to 6.X seconds
with this patch. Also, I check the whole test case, this change will not affect
the test coverage.

5bc0bcb2

Declare cursor for update should handle replicated table too · 80f49a18

由 Pengzhou Tang 提交于 1月 14, 2019

In 9.0 merge, we add bellow rule for FOR UPDATE:

select for update will lock the whole table, we do it at addRangeTableEntry.
The reason is that gpdb is an MPP database, the result tuples may not be on
the same segment. And for cursor statement, reader gang cannot get Xid to lock
the tuples, so we didn't add a LockRows node for distributed table to avoid
it

this rule should also apply to replicated table.

80f49a18

Synchronize mpp_execute option description and precedence rules in en… (#6734) · 7e0bd349

由 David Yozie 提交于 1月 22, 2019

* Synchronize mpp_execute option description and precedence rules in end-user documentation

* describe the order of precedence in each command

* one any -> any one

* Feedback from Lisa

7e0bd349

Parent partition and children partition table must have same columns · d8a613b8

由 ZhangJackey 提交于 1月 23, 2019

In the previous code, we can modify the parent partition's column
by ALTER TABLE ONLY, so the column of the parent partition
and children partition may be different.

In order to prohibit this situation, we check the DROP COLUMN/
ADD COLUMN/ALTER TYPE COLUMN statement to prohibit the
user only modify the column of parent partition or children partitions.

There was a discussion on gpdb-dev@:
https://groups.google.com/a/greenplum.org/forum/#!msg/gpdb-dev/0SzL_gSbqKo/d-2RpwKrFwAJ

d8a613b8

Delete top-level Dockerfile · ae67ca0f

由 Bradford D. Boyle 提交于 1月 15, 2019

It doesn't build because --disable-orca is not being passed to configure
and pivotaldata/gpdb-devel doesn't have xerces, on which Orca depends.

It seems this Dockerfile is not used. The Dockerfiles in
./src/tools/docker/*/Dockerfile are more recently maintained.
Co-authored-by: NBradford D. Boyle <bboyle@pivotal.io>
Co-authored-by: NBen Christel <bchristel@pivotal.io>

ae67ca0f

CI: Remove extra sles11 task input for RC job · ca26fb34

由 Kris Macoskey 提交于 1月 22, 2019

For GPDB 6 Beta, only Centos 6/7 need to be passing for the same commit
to be a valid release candidate.

This was originally done in this commit: fa63e7ab

But the commit was missing an update to the task yaml for the
Release_Candidate job to accompodate removal of the sles11 input.
Authored-by: NKris Macoskey <kmacoskey@pivotal.io>

ca26fb34

A

Replace ModifyPostgresqlConfSetting with ModifyConfSetting. · a9cd61e0
由 Ashwin Agrawal 提交于 1月 22, 2019

a9cd61e0

Validation for gp_dbid and gp_contentid between QD catalog and QE. · 78aed203

由 Ashwin Agrawal 提交于 1月 15, 2019

Since gp_dbid and gp_contentid is stored in conf files on QE, its
helpful to have validation to compare values between QD catalog table
gp_segment_configuration and QE. This validation is performed using
FTS. FTS message includes gp_dbid and gp_contentid values from
catalog. QE validates the value while handling the FTS message and if
finds inconsistency PANICS.

This check is mostly targeted during development to catch missed
handling of gp_dbid and gp_contentid values in config files. For
future features like pg_upgrade and gpexpand which copy master
directory and convert it to segment.
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>

78aed203

A
Delete gpsetdbid.py and gp_dbid.py. · 549cd61c
由 Ashwin Agrawal 提交于 1月 14, 2019
```
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
```
549cd61c

Store gp_dbid and gp_contentid in conf files. · 4eaeb7bc

由 Ashwin Agrawal 提交于 1月 16, 2019

Currently, gp_dbid and gp_contentid is passed as command line
arguments for starting QD and QE. Since, the values are stored in
master's catalog table, to get the right values, must start the master
first. Hence, hard-coded dbid=1 was used for starting the master in
admin mode always. This worked fine till dbid was not used for
anything on-disk. But given dbid is used for tablespace path in GPDB
6, startting the instance with wrong dbid, means inviting recovery
time failues, data corruption or data loss situations. Dbid=1 will go
wrong after failover to standby master as it has dbid != 1. This
commit hence eliminate the need of passing the gp_dbid and
gp_contentid on command line, instead while creating the instance the
values are stored in conf files for the instance.

This also helps to avoid passing gp_dbid as argument to pg_rewind,
which needs to start target instance in single user mode to complete
recovery before performing rewind operation.

Plus, this eases during development to just use pg_ctl start and not
require to correctly pass these values.

 - gp_contentid is stored in postgresql.conf file.

 - gp_dbid is stored in internal.auto.conf.

 - Introduce internal.auto.conf file created during
   initdb. internal.auto.conf is included from postgresql.conf file.

 - Separate file is chosen to write gp_dbid for ease handling during
   pg_rewind and pg_basebackup, as can exclude copying this file from
   primary to mirror, instead of trying to edit the contents of the
   same after copy during these operations. gp_contentid remains same
   for primary and mirror hence having it in postgresql.conf file
   makes senes. If gp_contentid is also stored in this new file
   internal.auto.conf then pg_basebackup needs to be passed contentid
   as well to write to this file.

 - pg_basebackup: write the gp_dbid after backup. Since, gp_dbid is
   unique for primary and mirror, pg_basebackup excludes copying
   internal.auto.conf file storing the gp_dbid. pg_basebackup explicit
   (over)writes the file with value passed as
   --target-gp-dbid. --target-gp-dbid due to this is mandatory
   argument to pg_basebackup now.

 - gpexpand: update gp_dbid and gp_contentid post directory copy.

 - pg_upgrade: retain all configuration files for
   segment. postgresql.auto.conf and internal.auto.conf are also
   internal configuration files which should be restored back after
   directory copy. Similar, change is required in gp_upgrade repo in
   restoreSegmentFiles() after copyMasterDirOverSegment().

 - Update tests to avoid passing gp_dbid and gp_contentid.
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>

4eaeb7bc

gpinitsystem: add mirror to catalog first and then create them. · f6e85f1f

由 Ashwin Agrawal 提交于 1月 16, 2019

To create mirrors, pg_basebackup needs to be performed. pg_basebackup
to correctly handle tablespaces needs dbid as argument. This
requirement exist because dbid is used in tablespace path.

dbid in master catalog to be in sync with what's used by mirror for
tablespace, need to add mirror to catalog first. Get the dbid and pass
the same to pg_basebackup for creating mirror.
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>

f6e85f1f

22 1月, 2019 3 次提交

Remove the case of external partition · 9cef1c91

由 Adam Lee 提交于 1月 17, 2019

pg_upgrade doesn't like it, please revert this commit once the restriction is
removed.

```
Checking for external tables used in partitioning           fatal

| Your installation contains partitioned tables with external
| tables as partitions.  These partitions need to be removed
| from the partition hierarchy before the upgrade.  A list of
| external partitions to remove is in the file:
| 	external_partitions.txt

Failure, exiting
```

9cef1c91

pg_dump: dump the namespace while processing external partitions · bbb9f9dd

由 Adam Lee 提交于 1月 16, 2019

We forgot to dump the namespace while processing external partitions, it
would be a problem since upstream pg_dump decided not to dump the
search_path, this commit fixes it.

bbb9f9dd

Fix gppkg error when master and standby master are in the same node · 1f33759b

由 Haozhou Wang 提交于 1月 22, 2019

If both master and standby master are set in the same
node, the gppkg utility will report error when uninstall a gppkg.
This is because, gppkg utility assume master and standby master
are in the different node, which is not be true in test environment.

This patch fixed this issue, when master and standby master are in
the same node, we skip to install/uninstall gppkg on standby master
node.

1f33759b

21 1月, 2019 2 次提交

Remove GPDB_93_MERGE_FIXME (#6699) · d286b105

由 Shaoqi Bai 提交于 1月 21, 2019

The code was added to tackle the case when FTS sends promote message, on mirror create the PROMOTE file and signal mirror to promote. But while mirror is still under promotion and not completed yet, FTS sends promote again, which creates the PROMOTE file again. Now, this PROMOTE file exist on promoted mirror which is acting as primary.
So, if basebackup was taken from this primary to create mirror, it included PROMOTE file and auto promoted mirror on creation which is incorrect. Hence, via FTS to detect if this file exist delete PROMOTE file was added along with pg_basebackup excluding the copy of PROMOTE file.

Now, given that background and upstream commit to always just delete the PROMOTE file on postmaster start, covers for even if PROMOTE file gets created after mirror promotion and gets copied over by pg_basebackup. On mirror startup no risk of auto-promotion. So, we can safely remove this code now.
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: NPaul Guo <pguo@pivotal.io>

d286b105

Use the right rel for largest_child_relation(). · 8712da1e

由 Richard Guo 提交于 1月 21, 2019

Function largest_child_relation() is used to find the largest child
relation for an inherited/partitioned relation, recursively. Previously
we passed a wrong rel as its param.

This patch finds in root->simple_rel_array the right rel for
largest_child_relation(). Also it replaces several rt_fetch with a
search in root->simple_rte_array.

This patch fixes #6599.
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

8712da1e

19 1月, 2019 1 次提交

docs - reorg pxf content, add multi-server, objstore content (#6736) · f601572d

由 Lisa Owen 提交于 1月 18, 2019

* docs - reorg pxf content, add multi-server, objstore content

* misc edits, SERVER not optional

* add server, remove creds from examples

* address comments from alexd

* most edits requested by david

* add Minio to table column name

* edits from review with pxf team (start)

* clear text credentials, reorg objstore cfg page

* remove steps with XXX placeholder

* add MapR to supported hadoop distro list

* more objstore config updates

* address objstore comments from alex

* one parquet data type mapping table, misc edits

* misc edits from david

* add mapr hadoop config step, misc edits

* fix formatting

* clarify copying libs for MapR

* fix pxf links on CREATE EXTERNAL TABLE page

* misc edits

* mapr paths may differ based on version in use

* misc edits, use full topic name

* update OSS book for pxf subnav restructure

f601572d