提交 · b1b99c434d50f9868d47e9dd5a26ab790c37ba4f · Greenplum / Gpdb

24 6月, 2020 1 次提交

Check whether the directory exists when deleting the tablespace (#10305) · b1b99c43

由 Jinbao Chen 提交于 6月 24, 2020

If the directory of tablespace does not exist, we should got a
error on commit transaction. But error on commit transaction will
cause a panic. So the directory of tablespace should be checked
so that we can avoid panic.

b1b99c43

22 6月, 2020 1 次提交

(

Fix flaky appendonly test. · f860ff0c

由 (Jerome)Junfeng Yang 提交于 6月 22, 2020

This fix the error:
```
---
/tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/appendonly.out
2020-06-16 08:30:46.484398384 +0000
+++ /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/appendonly.out
2020-06-16 08:30:46.556404454 +0000
@@ -709,8 +709,8 @@
   SELECT oid FROM pg_class WHERE relname='tenk_ao2'));
       case    | objmod | last_sequence | gp_segment_id
        -----------+--------+---------------+---------------
      + NormalXid |      0 | 1-2900        |             1
        NormalXid |      0 | >= 3300       |             0
      - NormalXid |      0 | >= 3300       |             1
        NormalXid |      0 | >= 3300       |             2
        NormalXid |      1 | zero          |             0
        NormalXid |      1 | zero          |             1
```

The flaky is because of the orca `CREATE TABLE` statement without
`DISTRIBUTED BY` will treat the table as randomly distributed.
But the planner will treat as distributed by the table's first column.

ORCA:
```
CREATE TABLE tenk_ao2 with(appendonly=true, compresslevel=0,
blocksize=262144) AS SELECT * FROM tenk_heap;
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL
policy entry.
```

Planner:
```
CREATE TABLE tenk_ao2 with(appendonly=true, compresslevel=0,
blocksize=262144) AS SELECT * FROM tenk_heap;
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s)
named 'unique1' as the Greenplum Database data distribution key for this
table.
```

So the data distribution for table tenk_ao2 is not as expected.
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>

f860ff0c

08 6月, 2020 1 次提交

Retire guc gp_session_role (#9396) · f6297b96

由 Paul Guo 提交于 6月 08, 2020

Use guc gp_role only now and replace the functionality of guc gp_session_role with it
also. Previously we have both gucs. The difference of the two gucs are (copied
from code comment):

 * gp_session_role
 *
 * - does not affect the operation of the backend, and
 * - does not change during the lifetime of PostgreSQL session.
 *
 * gp_role
 *
 * - determines the operating role of the backend, and
 * - may be changed by a superuser via the SET command.

This is not friendly for coding. For example, You could find Gp_role and
Gp_session_role are set as GP_ROLE_DISPATCH on Postmaster & many aux processes
on all nodes (even QE nodes) in a cluster, so you can see that to differ from
QD postmaster and QE postmaster, current gpdb uses an additional -E option in
postmaster arguments. These makes developers confusing when writing role branch
related code given we have three related variables.  Also some related code is
even buggy now (e.g. 'set gp_role' even FATAL quits).

With this patch we just have gp_role now. Some changes which might be
interesting in the patch are:

1. For postmaster, we should specify '-c gp_role=' (e.g. via pg_ctl argument) to
   determine the role else we assume the utility role.

2. For stand-alone backend, utility role is enforced (no need to specify by
   users).

3. Could still connect QE/QD nodes using utility mode with PGOPTIONS, etc as
   before.

4. Remove the '-E' gpdb hacking and align the '-E' usage with upstream.

5. Move pm_launch_walreceiver out of the fts related shmem given the later is
   not used on QE.
Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
Reviewed-by: NGang Xiong <gxiong@pivotal.io>
Reviewed-by: NHao Wu <gfphoenix78@gmail.com>
Reviewed-by: NYandong Yao <yyao@pivotal.io>

f6297b96

03 6月, 2020 1 次提交

Allow CLUSTER on append-optimized tables · 179feb77

由 Andrey Borodin 提交于 6月 03, 2020

Cluster on AO tables is implemented by sorting the entire AO table using
tuple sort framework, according to a btree index defined on the table.

A faster way to cluster is to scan the tuples in index-order, but this
requires index-scan support.  Append-optimized tables do not support
index-scans currently, but when this support is added, the cluster
operation can be enhanced accordingly.

Author: Andrey Borodin <amborodin@acm.org>
Reviewed and slightly edited by: Asim R P <pasim@vmare.com>

Merges GitHub PR #9996

179feb77

31 5月, 2020 2 次提交

Allow AOCO tables to inherit a parent table. · 177abc76

由 Heikki Linnakangas 提交于 5月 31, 2020

Because why not? This was forbidden back in 2012, with an old JIRA ticket
that said:

MPP-15735 - Inheritance is not supported with column oriented table

This shouldn't be possible:

Create table parent_tb2 (a1 int, a2 char(5), a3 text, a4 timestamp,
a5 date, column a1 ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768))
with (appendonly=true,orientation=column)
distributed by (a1);

Create table child_tb4 (c1 int, c2 char) inherits(parent_tb2)
with (appendonly = true, orientation = column);

The reason is, dealing with column compression and inheritance is tricky
and inheritance shouldn't even be supported.

That explanation didn't go into any details, but I can see the trickiness.
There are many places you can specify column compression options even
without inheritance: in gp_default_storage_options GUC, in an ENCODING
directive in CREATE TABLE command, in a "COLUMN foo ENCODING ..."
directive in the CREATE TABLE command, or in the datatype's default
storage options. Inheritance adds another dimension to it: should the
current gp_default_storage_options override the options inherited from the
parent? What if there are multiple parents with conflicting options?

The easiest solution is to never inherit the column encoding options from
the parent. That's a bit lame, but there's some precedence for that with
partitions: if you ALTER TABLE ADD PARTITION, the encoding options are
also not copied from the parent. So that's what this patch does.

One interesting corner case is to specify the options for an inherited
column with "CREATE TABLE (COLUMN parentcol ENCODING ...) INHERITS (...)".
Thanks to the previous refactoring, that works, even though 'parentcol'
is not explicitly listed in the CREATE TABLE command but inherited from
the parent. To make dump & restore of that work correctly, modify pg_dump
so that it always uses the "COLUMN foo ENCODING ..." syntax to specify
the options, instead of just tacking "ENCODING ..." after the column
definition.

One excuse for doing this right now is that even though we had forbidden
"CREATE TABLE <tab> INHERITS (<parent>)" on AOCO tables, we missed "ALTER
TABLE <tab> INHERIT <parent>". That was still allowed, and if you did
that you got an inherited AOCO table that worked just fine, except that
when it was dumped with pg_dump, the dump was unrestorable. It would have
been trivial to add a check to forbid "ALTER TABLE INHERIT" on an AOCO
table, instead, but it's even better to allow it.

Fixes https://github.com/greenplum-db/gpdb/issues/10111Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

177abc76

Refactor the way ENCODING clauses are parsed. · 391b78eb

由 Heikki Linnakangas 提交于 5月 31, 2020

Instead of parsing them in the parse analysis phase, delay it until
DefineRelation(), after merging the definitions of inherited attributes.
We don't support INHERIT clause on an AOCO table today, but if we lift
that limitation (as I'm planning to do in a follow-up commit), we need to
be able to apply "COLUMN <col> ENCODING ..." clauses to inherited columns
too.

Before this, all callers of create_ctas_internal() had to also call
AddDefaultRelationAttributeOptions(), in case it was an AOCO table. Now
that DefineRelation() handles the default storage options, that's no
longer needed.
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

391b78eb

22 5月, 2020 1 次提交

Fix pkill path in sreh test (#10171) · 36b1c7d0

由 Huiliang.liu 提交于 5月 22, 2020

pkill is not in /bin/ folder on Ubuntu, so gpfdist can't be killed
in sreh test. That would make gpfdist regression test fail

36b1c7d0

19 5月, 2020 1 次提交

Remove the pg_exttable catalog · e4b499aa

由 Adam Lee 提交于 4月 15, 2020

Keep the syntax of external tables, but store as foreign tables with
options into the catalog.

While using it, transform the foreign table options to an ExtTableEntry,
which is compatible with external_table_shim, PXF and other custom
protocols.

Also, add `DISTRIBUTED` syntax support for `CREATE FOREIGN TABLE` if
the foreign server indicates it's an external table.

Note:
1, all permissions handling from pg_authid are still, to keep the
external tables' GRANT queries, will do that in another PR if possible.
2, multiple URI locations are stored in foreign table options, separated
by `|`.
Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>

e4b499aa

28 4月, 2020 1 次提交

Bring back load balancing of single segment index for gang type GANGTYPE_SINGLETON_READER. (#9995) · a0a5b4d5

由 Paul Guo 提交于 4月 28, 2020

It was removed during the slice refactor work. I found that when running test
isolation2/terminate_in_gang_creation. This feature should be quite useful in
parallel OLTP load.

With this code change, test isolation2/terminate_in_gang_creation needs to be
modified to be deterministic. Also added tests in regression/dispatch for this
code change. Changes in other tests are not relevant (just for cleanup).
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>

a0a5b4d5

26 4月, 2020 1 次提交

Fix ALTER DATABASE SET ... FROM CURRENT · 5a855614

由 xiong-gang 提交于 4月 26, 2020

ALTER DATABASE SET ... FROM CURRENT dispatches incorrect statement to the
segments. Reported in https://github.com/greenplum-db/gpdb/issues/9823

5a855614

20 3月, 2020 1 次提交

(

Enable external table's error log to be persistent for ETL. (#9757) · 04fdd0a6

由 (Jerome)Junfeng Yang 提交于 3月 20, 2020

For ETL user scenarios, there are some cases that may frequently create
and drop the same external table. And once the external table gets dropped.
All errors stored in the error log will lose.

To enable error log persistent for external with the same
"dbname"."namespace"."table".
Bring in "error_log_persistent" external table option. If create the
external table with `OPTIONS (error_log_persistent 'true')` and `LOG ERROR`,
the external's error log will be name as "dbid_namespaceid_tablename"
under "errlogpersistent" directory.
And drop external table will ignore to delete the error log.

Since GPDB 5, 6 still use pg_exttable's options to mark
LOG ERRORS PERSISTENTLY, so keep the ability for loading from
OPTIONS(error_log_persistent 'true').

Create separate `gp_read_persistent_error_log` function to read
persistent error log.
If the external table gets deleted, only the namespace owner
has permission to delete the error log.

Create separate `gp_truncate_persistent_error_log` function to delete
persistent error log.
If the external table gets deleted. Only the namespace owner has
permission to delete the error log.
It also supports wildcard input to delete error logs
belong to a database or whole cluster.

If drop an external table create with `error_log_persistent`. And then
create the same "dbname"."namespace"."table" external table without
persistent error log. It'll write errors to the normal error log.
The persistent error log still exists.
Reviewed-by: NHaozhouWang <hawang@pivotal.io>
Reviewed-by: NAdam Lee <ali@pivotal.io>

04fdd0a6

17 3月, 2020 1 次提交

Simplify the implementation of "dedup" semi-join paths. · 9628a332

由 Heikki Linnakangas 提交于 3月 11, 2020

There are two parts to this commit:

1. Instead of using ctids and "fake" ctids to disambiguate rows that have
been duplicated by a Broadcast motion, generate a synthetic 64-bit
rowid specifically for that purpose, just below the Broadcast. The old
method had to generate subplan IDs, fake ctids etc. to have a
combination of columns that formed the unique ID, and that entailed a
lot of infrastructure, like the whole concept of "pseudo columns". The
new method is much simpler, and it works the same way regardless of
what's below the Broadcast. The new 64-bit rowids are generated by
a new RowIdExpr expression node. The high 16 bits of the rowid are
the segment ID, and the low bits are generated by a simple counter.

2. Replace the cdbpath_dedup_fixup() post-processing step on the Plan
tree, by adding the unique ID column (now a RowIdExpr) in the Path's
pathtarget in the first place. cdbpath_motion_for_join() is the
function that does that now. Seems like a logical place, since that's
where any needed Motions are added on top of the join inputs. The
responsibility for creating the Unique path is moved from
cdb_add_join_path() to the create_nestloop/hashjoin/mergejoin_path()
functions.

This slightly changes the plan when one side of the join has Segment or
SegmentGeneral locus. For example:

select * from srf() r(a) where r.a in (select t.a/10 from tab t);

With this commit, that's implemented by running the FunctionScan and
generating the row IDs on only one node, and redistributing/broadcasting
the result. This is not exactly the same as before this PR: previously, if
the function was stable or immutable, we assumed that a SRF returns rows
in the same order on all nodes and relied on that when generating the
"synthetic ctid" on each row. We no longer assume that, but force the
function to run on a single node instead. That seems acceptable, this kind
of plans are helpful when the other side of the join is much larger, so
redistributing the smaller side is probably still OK.

This also uses the same strategy when the inner side is a replicated table
rather than a SRF. That was not supported before, but there was a TODO
comment that hinted at the possibility.

Fixes https://github.com/greenplum-db/gpdb/issues/9741, and adds tests
for it.
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>

9628a332

09 3月, 2020 1 次提交

Manage AO segments in QE nodes, instead of the master. · 5778f311

由 Heikki Linnakangas 提交于 3月 09, 2020

Overview
--------

Manage the allocation of AO segfiles in QE nodes, instead of the master.
That simplifies a lot of things. We no longer need to send the decision
on which segfile to use from QD to QE, we no longer need to keep up-to-date
tuple counts in the master, and so forth.

This gets rid of the in-memory hash table to track AO segment status. To
lock a segfile for insertion or other operations, we now rely on tuple
locks on the AO segment table instead.

VACUUM code has been refactored. I tried to make the code structure closer
to upstream, to reduce merge conflicts as we merge with upstream. The way
the VACUUM phases are dispatched, and the phases needed, are different in
the new regime. Some phases could use local transactions, but others have
to still use distributed transactions, but for simplicity we use
distributed transactions for all of the phases.

Now that QE nodes choose the insertion target segfile independently, we
could use the same algorithm for choosing the target segfile in utility
mode, and wouldn't need to reserve segfile #0 for special operations
anymore. But to keep things familiar, and to avoid any more churn in the
regression tests, this keeps the historical behavior and still reserves
segfile #0 for utility mode and CREATE TABLE AS.

Previously, SERIALIZABLE (and REPATABLE READ) transactions were treated
differently from transactions in READ COMMITTED mode. If there was an
old serializable transaction running, VACUUM refrained from recycling
old AO segfiles after compaction, because the serializable transaction
might lock the table later and try to read the rows that were compacted
away. But that was not completely waterproof; a READ COMMITTED transaction
that's still holding onto an old snapshot is no different from a
serializable transaction. There's a gap between acquiring a snapshot and
locking a table, so even if a READ COMMITTED backend is not holding a lock
on a table right now, it might open it later. To fix that, remove the
distinction between serializable and other transactions. Any open
transaction will now prevent AO segfiles in awaiting-drop state from being
removed. This commit adds an isolation2 test case called
'vacuum_self_function' to test one such case; it used to fail before this
commit.

That is a user-visible change in behavior that's worth calling out
explicitly. If there are concurrent transactions running, even in
READ COMMITTED mode, and even if they don't access the table at all,
VACUUM will no longer be able to recycle compacted segfiles. Next VACUUM
will clean them up, after the old concurrent transactions. Meanwhile,
the awaiting-drop segfiles will use up disk space, and increase the risk
that an insertion fails because all segfiles are in use.

This work was originally submitted github PR #790; you might see references
to that in the discussions.

Code change details
-------------------

This gets rid of the in-memory hash table to track AO segment status. To
lock a segfile for insertion or other operations, we now rely on tuple
locks on the AO segment table instead. When an AO segment is chosen as
insertion target, its AO segment table row is locked for the transaction
(by heap_lock_tuple, i.e. by stamping its XMAX). Later, when the insertion
finishes, the tuple is updated with the new EOF (no change there). When
choosing a new insertion target segfile, we scan the pg_aoseg table instead
of the hash table, and use the xmin/xmax to determine which ones are safe
to use. The process of choosing a target segment itself is not
concurrency-safe, so the relation extension lock is held to protect that.
See new "Locking and snapshots" section in
src/backend/access/appendonly/README.md for details.

tupcount and modcount are not kept up-to-date in the master. Remove all the
code needed to send the totals back from QE to QD at end of a command,
including the GPDB-specific 'o' protocol FE/BE message.

Remove the machinery to track which PROCs are in a serializable transaction.
It was not completely waterproof anyway, and there were some existing bugs
in that area. A READ COMMITTED transaction that's still holding onto an old
snapshot is no different from a serializable transaction. This commit adds
an isolation2 test case called 'vacuum_self_function' to test one such case;
it used to fail before this commit.

Move responsibility of creating AO segfile to callers of
appendonly_insert_init(). The caller must have locked, and created if
necessary, the segfile by calling ChooseSegnoForWrite() or
LockSegnoForWrite(). Previously, the pg_aoseg was not locked in rewriting
ALTER TABLE and other such commands that always used segfile #0. That was
OK, because they held an AccessExclusiveLock on the table, but let's be
consistent.

When selecting the insertion target for VACUUM compaction, prefer creating
a new segfile over using an existing non-empty segfile. Existing empty
segfiles are still choice #1, but let's avoid writing to non-empty
existing files, because otherwise we cannot vacuum those segfiles in the
same VACUUM cycle.

Warn, if compaction fails because no insertion segfile could be chosen.
This used to be an ERROR in the old code, but I think a WARNING is more
appropriate.

In targetid_get_partition(), don't create duplicate ResultRelInfo entries.
It's possible that the "result relation" is a partition of a partitioned
table, rather than the root. Don't create a new ResultRelInfo hash entry
for it in that case, use the parent ResultRelInfo. This was harmless
before, but now the logic of when we need to increment the modcount was
getting confused. Per the 'qp_dropped_cols' regression test.

Test change details
-------------------

src/test/regress/greenplum_schedule

Run uao[cs]_catalog_tables tests separately.

Because we are now more lazy with dropping AWAITING_DROP segments, the
issue mentioned in the comment with serializable transactions now
applies to all transactions.

uao_compaction/threshold
uaocs_compaction/threshold

Adjust uao[cs]_compaction/threshold tests case to make it pass.

In the first few changed outputs, I'm not entirely sure why one segment
chooses to insert the new tuples to segfile #2 while others pick
segfile #1, but there's also nothing wrong with that. The point of this
PR is that QE nodes can make the decision independently.

In the last test with gp_toolkit.__gp_aovisimap_compaction_info(),
all the test data is inserted to one QE node, so segfile 2 isn't
allocated on others.

isolation2/vacuum_self_function

Add test case for the non-SERIALIZABLE case discussed earlier.

See also discussion at
https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/49MW3tBoH8s/W_bdyxZMAgAJ

This test errored out before this commit:

ERROR: read beyond eof in table "ao" file "base/16384/16387.1", read position 0 (small offset 0), actual read length 0 (large read length 192) (cdbbufferedread.c:211)

Works with this PR, but we don't drop old AO segments as aggressively
in vacuum anymore. Partly because we're now more careful and correct.
But also because we don't detect the case that the VACUUM is the oldest
transaction in the system, and nothing else is running; we miss the
opportunity to drop compacted AO segments immediately in that case. The
next VACUUM should drop them though.

Furthermore, this test is written so that if we tried to perform AO
compaction phase in a local rather than distributed transaction, it
would fail. (I know because I tried to do it that way at first.) This
is "Problem 3" that I mentioned on that gpdb-dev thread.

isolation2/fsync_ao

Adjust "hit counts" in fsync_ao test.

The number of times the fsyncs happen have changed. To be honest, I'm not
sure why, nor what the principled way to get the correct value would be,
but as long as the counts are > 0, this looks like success to me.

isolation2/gdd/planner_insert_while_vacuum_drop

Remove GDD test that's not relevant anymore.

The test was testing a deadlock between QD and QE, involving AO vacuum.
We no longer take an AccessExclusiveLock in the QD, so the test case
doesn't work. I'm not sure what an equivalent deadlock might be with the
new AO vacuum, so just remove the test.

isolation2/vacuum_drop_phase_ao
isolation2/uao/insert_should_not_use_awaiting_drop

These tests don't make sense in the QE-managed regime anymore. I
couldn't figure out what these should look like with this commit, so
remove.

isolation2/uao/insert_policy

Update test case to cover a scenario where two transactions try to
create the initial segment for relation. We had a bug in that scenario
during development, and it wasn't covered by any existing tests.

isolation2/uao/select_while_vacuum_serializable2

The behavior is different now, a serializable transaction doesn't
prevent compaction phase from happening, only the recycling of the
segfile. Adjust expected output accordingly.

isolation2/uao/compaction_utility

VACUUM in utility mode can compact now.
Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io>
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NAsim R P <apraveen@pivotal.io>
Co-authored-by: NXin Zhang <xzhang@pivotal.io>
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
Reviewed-by: NTaylor Vesely <tvesely@pivotal.io>

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/49MW3tBoH8s/W_bdyxZMAgAJ
Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/lUwtNxaT3f0/Cg39CP8uAwAJ

5778f311

04 3月, 2020 1 次提交

Fix the way we check if a partition table is empty in Relcache translator · 12011396

由 Abhijit Subramanya 提交于 2月 28, 2020

The relation statistics object stores a flag which indicates if a relation is
empty or not. However this was being set based on the value of stats_empty
flag. This flag was being set in the function `cdb_estimate_partitioned_numtuples`.
It would be set to true if any of the child partitions had their relpages set
to 0. But if other partitions had data in them, then the relation should not be
considered empty. We should actually be checking for the total number of rows
to determine if a partition table is empty or not. Also the stats_empty flag
was not being used for anything else so this commit removes it.

12011396

31 1月, 2020 3 次提交

Refactor away special code for external tables in planner and executor · a55b9453

由 Heikki Linnakangas 提交于 1月 31, 2020

This makes external tables less of a special case throughout the planner
and executor. They're now mostly handled through the FDW API callbacks.

* Add a FDW handler function and implement the API routines to construct
  ForeignScan paths and Plan nodes, iterate through a scan, and to insert
  tuples to the external/foreign table. The API routines just call through
  to the existing external table functions, like external_getnext() and
  external_insert().

* Remove ExternalScan plan node in the executor and the ExternalScan path
  node from the planner. Use ForeignScan plan and path nodes instead, like
  for any other FDW. Move code related to external table planning to a
  new exttable_fdw_shim.c file.

* The parameters previously carried in the  ExternalScan struct are now in
  a new ExternalScanInfo struct. (Or, the ExternalScan struct has been
  renamed to ExternalScanInfo, if you want to think of it that way.) It's
  not a plan node type anymore, but it still needs read/out function
  support so that the parameters can be serialized. Alternatively, the
  parameters could be carried in Lists of Values and other existing
  expression types, but a struct seems easier to handle. Perhaps the
  cleanest solution would be to use the ExtensibleNode infrastructure for
  it, but I'll leave that for another patch. As long as the external table
  FDW is in the backend anyway, it's simplest to have the out/read/copy
  funcs built-in as well.

* Modify ForeignScan executor node so that it can make up "fake ctids" like
  ExternalScan did, and also add "squelching" support to it.

* Remove special handling of external tables from ModifyTable node. It now
  uses the normal FDW routines for it.

COPY still calls directly into the external_insert() function, because
PostgreSQL doesn't support COPY into a foreign table until version 11. (We
don't seem to have any tests for COPY TO/FROM an external table. TODO: add
test.)

a55b9453

Change the catalog representation for external tables. · b62e0601

由 Heikki Linnakangas 提交于 1月 31, 2020

External tables now use relkind='f', like all foreign tables. They have
an entry in pg_foreign_table, as if they belonged to a special foreign
server called "exttable_server". That foreign server gets special
treatment in the planner and executor, so that we still plan and execute
it the same as before.

* ALTER / DROP EXTERNAL TABLE is now mapped to ALTER / DROP FOREIGN TABLE.
  There is no "OCLASS_EXTTABLE" anymore. This leaks through to the user
  in error messages, e.g:

    postgres=# drop external table boo;
    ERROR:  foreign table "boo" does not exist

  and to the command tag on success:

    postgres=# drop external table boo;
    DROP FOREIGN TABLE

* psql \d now prints external tables as Foreign Tables.

Next steps:
* Use the foreign table API routines instead of special casing
  "exttable_server" everywhere.

* Get rid of the pg_exttable table, and store the all the options in
  pg_foreign_table.ftoptions instead.

* Get rid of the extra fields in pg_authid to store permissions to
  create different kinds of external tables. Store them as ACLs in
  pg_foreign_server.

b62e0601

H
Remove restriction that you cannot use ALTER TABLE on external tables. · aac5391a
由 Heikki Linnakangas 提交于 1月 31, 2020
```
In PostgreSQL, ALTER TABLE is allowed where ALTER FOREIGN TABLE is. Allow
external tables the same leniency.
```
aac5391a

14 1月, 2020 1 次提交

Forgot to remove WARNING from expected output. · bb1c67f7

由 Heikki Linnakangas 提交于 1月 14, 2020

Commit 25a7630e silenced printing a lot of spurious "time constraints
added on superuser rule" warnings from the log. But there was also one
such warning memorized in the expected output, which was overlooked by
that commit. Remove the warning.

The warning was memorized in the expected output in the 9.0 merge
(commit 5ec5c30a), but it should never have been printed. Making a role
superuser with "ALTER ROLE <role> SUPERUSER" drops all
pg_auth_time_constraint entries for the role, so this
check_auth_time_constraints() call shouldn't print such a warning.

bb1c67f7

11 1月, 2020 1 次提交

Refactor the late "parallelization" stages of the planner. · 93abe741

由 Heikki Linnakangas 提交于 1月 10, 2020

* Build a preliminary "planner slice table" at the end of planning, and
  attach it to the PlannedStmt. Executor startup turns that into the
  final executor slice table. This replaces the step where executor
  startup scanned the whole Plan tree build the slice table.

* Now that the executor startup gets a pre-built planner slice table, it
  doesn't need the Flow structures for building the slice table anymore.
  Also refactor the few other remaining places in nodeMotion.c and
  nodeResult.c that accessed the Flows to use the information from the
  slice table instead. The executor no longer looks at the Flows at all,
  so we don't need to include them in the serialized plan tree anymore.
  ORCA translator doesn't need to build Flow structures anymore either.
  Instead, it now builds the planner slice table like the Postgres planner
  does.

* During createplan.c processing, keep track of the current "slice", and
  attach direct dispatch and other per-slice information to the PlanSlice
  struct directly, instead of carrying it in the Flow structs. This
  renders the Flows mostly unused in the planner too, but there is still
  one thing we use the Flows for: to figure out when we need to add a
  Motion on top of a SubPlan's plan tree, to make the subplan's result
  available in the slice where the SubPlan is evaluated. There's a "sender
  slice" struct attached to a Motion during create_plan() processing to
  represent the sending slice. But the slice ID is not assigned at that
  stage yet. Motion / Slice IDs are assigned later, when the slice table
  is created.

* Only set 'flow' on the topmost Plan, and in the child of a Motion.

* Remove unused initplans and subplans near the end of planning, after
  set_plan_references(), but before building the planner slice table. We
  used to remove init plans and subplans a little bit earlier, before
  set_plan_references(), but there was at least one corner case involving
  inherited tables where you could have a SubPlan referring a subplan
  in the plan tree before set_plan_references(), but set_plan_references()
  transformed it away. You would end up with an unused subplan in that
  case, even though we previously removed any unused subplans. This way
  we don't need to deal with unused slices in the executor.

* Rewrite the logic to account direct-dispatch cost saving in plan cost.
Reviewed-by: NTaylor Vesely <tvesely@pivotal.io>

93abe741

03 1月, 2020 2 次提交

Forbid DROP PARTITION with no partition name or value. · bf8dc8f4

由 Heikki Linnakangas 提交于 1月 03, 2020

Previously, this was allowed:

    postgres=# ALTER TABLE parttab DROP PARTITION;
    ALTER TABLE

It dropped the first partition, by rank. In other words, it was a
shorthand for:

    ALTER TABLE parttab DROP PARTITION FOR (RANK (1));

We don't need such a shorthand. It's too much of a footgun; if a user
wants to drop partition, he should name it properly. The documentation
doesn't mention this syntax at all, so no docs changes required.
Reviewed-by: NAsim R P <apraveen@pivotal.io>

bf8dc8f4

Fix error scenarios of alter_db_set_tablespace test · 5214fd9b

由 Ashwin Agrawal 提交于 1月 02, 2020

alter_db_set_tablespace test has scenarios to inject error fault for
content 0. Then run ALTER DATABASE SET TABLESPACE command. Once error
is hit on content 0, the transaction is aborted. Based on when the
transaction gets aborted, its unpredictable what point the command has
reached for non-content 0 primaries. If non-content 0 primaries, have
reached the point of directory copy, then only abort record for them
will have database directory deletion record to be replayed on mirror,
else not. The test was waiting for directory deletion fault to be
triggered for all the content mirrors. This expectation is incorrect
and makes test flaky based on timing.

Hence, modifying the test for error scenarios to only wait for
directory deletion for content 0. Then wait for all the mirrors to
replay all the currently generated wal records, post which make sure
destination directory is empty. This should eliminate the flakiness
from the test.
Reviewed-by: NAsim R P <apraveen@pivotal.io>

5214fd9b

31 12月, 2019 1 次提交

Fix flakiness of alter_db_set_tablespace test · 1959053b

由 Ashwin Agrawal 提交于 12月 30, 2019

Test should make sure mirror has processed the drop database wal
record before proceeding to perform check for destination tablespace
directory non-existence. It skipped performing the wait for content 0,
incase of panic after writing wal record, that's incorrect.

Adding to wait for all the mirror's to process the wal record and then
only perform the validation. This should fix the failures seen in CI
with below diff

```
--- /tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/alter_db_set_tablespace.out    2019-10-14 16:09:43.638372174 +0000
+++ /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/alter_db_set_tablespace.out    2019-10-14 16:09:43.714379108 +0000
@@ -1262,25 +1271,352 @@
 CONTEXT:  PL/Python function "stat_db_objects"
 NOTICE:  dboid dir for database alter_db does not exist on dbid = 4
 CONTEXT:  PL/Python function "stat_db_objects"
-NOTICE:  dboid dir for database alter_db does not exist on dbid = 5
-CONTEXT:  PL/Python function "stat_db_objects"
 NOTICE:  dboid dir for database alter_db does not exist on dbid = 6
 CONTEXT:  PL/Python function "stat_db_objects"
 NOTICE:  dboid dir for database alter_db does not exist on dbid = 7
 CONTEXT:  PL/Python function "stat_db_objects"
 NOTICE:  dboid dir for database alter_db does not exist on dbid = 8
 CONTEXT:  PL/Python function "stat_db_objects"
- dbid | relfilenode_dboid_relative_path | size
-------+---------------------------------+------
-    1 |                                 |
-    2 |                                 |
-    3 |                                 |
-    4 |                                 |
-    5 |                                 |
-    6 |                                 |
-    7 |                                 |
-    8 |                                 |
-(8 rows)
+ dbid | relfilenode_dboid_relative_path |  size
+------+---------------------------------+--------
+    1 |                                 |
+    2 |                                 |
+    3 |                                 |
+    4 |                                 |
+    5 | 180273/112                      |  32768
+    5 | 180273/113                      |  32768
+    5 | 180273/12390                    |  65536
+    5 | 180273/12390_fsm                |  98304

<....choping output as very long...>

+    5 | 180273/PG_VERSION               |      4
+    5 | 180273/pg_filenode.map          |   1024
+    6 |                                 |
+    7 |                                 |
+    8 |                                 |
+(337 rows)

```
Reviewed-by: NAsim R P <apraveen@pivotal.io>

1959053b

25 12月, 2019 1 次提交

icw: fix flaky alter_db_set_tablespace · 54e3af6d

由 Ning Yu 提交于 12月 25, 2019

The alter_db_set_tablespace test has been flaky for a long time, one
typical failure is like this:

    --- /regress/expected/alter_db_set_tablespace.out
    +++ /regress/results/alter_db_set_tablespace.out
    @@ -1204,21 +1213,348 @@
     NOTICE:  dboid dir for database alter_db does not exist on dbid = 2
     NOTICE:  dboid dir for database alter_db does not exist on dbid = 3
     NOTICE:  dboid dir for database alter_db does not exist on dbid = 4
    -NOTICE:  dboid dir for database alter_db does not exist on dbid = 5
     NOTICE:  dboid dir for database alter_db does not exist on dbid = 6
     NOTICE:  dboid dir for database alter_db does not exist on dbid = 7
     NOTICE:  dboid dir for database alter_db does not exist on dbid = 8

The test disables fts probing with fault injection, however it does not
wait for the fault to be triggered.  The other problem is that the fts
probing was disabled after the PANIC, that might not be in time.

So the problem was that we were having a scenario where we were
injecting the fault after the fts loop was beyond the fault point and
then when the subsequent PANIC was caused, fts was still active.

By manually triggering, and then by waiting to ensure that the fault is
hit at least once, we can guarantee that the scenario described above
doesn't happen.
Reviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>
Reviewed-by: NTaylor Vesely <tvesely@pivotal.io>
Reviewed-by: NHubert Zhang <hzhang@pivotal.io>

54e3af6d

23 12月, 2019 2 次提交

Allow exclusion constraints. · cdb61585

由 Heikki Linnakangas 提交于 12月 23, 2019

Exclusion constraints can be enforced if the table is replicated, or if
the distribution key columns are part of the constraint, with the same =
operator as in the distribution key's hash opclass. This is analogous to
the similar restrictions on unique indexes.

Exclusion constraints on partitioned tables would have similar
restrictions, but because of the way creation of partitioned tables
currently works, checking for those restrictions would be tricky. A
partitioned table is only marked as partitioned after the indexes have
been created, so we cannot check for compatibility with partitioning keys
during index creation, like we can with distribution keys. Refactoring
creation of partitioned tables is more than I can chew right now, so for
now, continue to disallow exclusion constraints on partitioned tables.
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
Reviewed-by: NKalen Krempely <kkrempely@pivotal.io>

cdb61585

Fix cross-checks between unique/primary keys and distribution keys. · a2b1050d

由 Heikki Linnakangas 提交于 12月 23, 2019

We had checks that a unique/primary key index must contain all the
distribution key columns. Otherwise, rows with duplicate index keys might
be stored on different segments, and we would fail to enforce the
uniqueness. But the checks didn't pay any attention to operator classes.
If the index's operator class had a different idea of what "equal" means,
then you might have two rows that are identical according to the index
operator class, but different according to the distribution key hash
operator class, and be stored on different segments.

Modify the checks to also check that the operator classes use the same
equals operator. Refactor so that we can share the code doing the checks
for both CREATE INDEX and ALTER TABLE SET DISTRIBUTED BY.

Fixes https://github.com/greenplum-db/gpdb/issues/6971

a2b1050d

04 12月, 2019 2 次提交

Crash caused by null str in make_url (#9146) · 027b3ae7

由 Chen Mulong 提交于 12月 04, 2019

When the hostname cannot be resolved, getDnsAddress can return NULL
which should be checked before using.
Authored-by: NChen Mulong <muchen@pivotal.io>

027b3ae7

COPY: remove redundant delimiter checking (#9114) · aa198d62

由 Adam Lee 提交于 12月 03, 2019

There is one after and with correct delim_off checking.

And take the chance to tidy the ProcessCopyOptions(), align with
upstream, handle delim_off.

aa198d62

28 11月, 2019 1 次提交

Only parse enough of each COPY row, to compute which segment to send it to. · a8aa1c4a

由 Heikki Linnakangas 提交于 12月 14, 2018

In the common case that the distribution key is one of the first columns
in a table, COPY FROM can scale a lot better, if we only parse, and call
the input functions, for the first columns in the QD, and delay parsing
the rest of the into the QEs. The overall effort is the same, but the QD
can easily become a bottleneck.

GPDB 5 did this, but we lost in the refactoring of the COPY code in the
9.1 merge. This brings it back.

Update answer file for test sreh. Master would reject tuple "bad|8|8"
and then second tuple it will see bad will be "eleven". Hence, reject
that. I think previously since master was parsing full tuples it will
see incorrect data for non-distribution columns also. Hence, reported
error for earlier tuple.
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>

a8aa1c4a

26 11月, 2019 1 次提交

When the error limit is reached, print the last error in the log, too. · ca04fe75

由 Heikki Linnakangas 提交于 11月 25, 2019

Seems less surprising.
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>

ca04fe75

25 11月, 2019 1 次提交
- H
  On CREATE INDEX on AOCO table, check WHERE clause for columns to scan. · 56e50995
  由 Heikki Linnakangas 提交于 11月 24, 2019
```
Fixes github issue https://github.com/greenplum-db/gpdb/issues/9022Reviewed-by: NAsim R P <apraveen@pivotal.io>
```
  56e50995
08 11月, 2019 1 次提交
- H
  
  Move a few tests to a more logical place, avoid drop+create test tables. · e582e155
  由 Heikki Linnakangas 提交于 11月 08, 2019
  
  e582e155
30 10月, 2019 1 次提交

Add tests for SELECT DISTINCT in a table function argument. · c665a1e5

由 Heikki Linnakangas 提交于 10月 30, 2019

We had an old GPDB_84_MERGE_FIXME comment about whether the behavior of
that was correct. So I decided to finally see what happens. The behavior
seems reasonable to me, so just add a few tests for it and remove the
FIXME.
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

c665a1e5

29 10月, 2019 1 次提交

Elevate force_mirrors_to_catch_up() to gp_inject_fault · 5feccaae

由 Soumyadeep Chakraborty 提交于 10月 24, 2019

This also applies the fix in efd76c4c to all callers of
force_mirrors_to_catch_up(), thus eliminating flakes such as:

```
-- Ensure that the mirrors including the standby master have removed the dboid dir under the target tablespace.
 SELECT gp_wait_until_triggered_fault('after_drop_database_directories', 1, dbid) FROM gp_segment_configuration WHERE role='m';
- gp_wait_until_triggered_fault
--------------------------------
- Success:
- Success:
- Success:
- Success:
-(4 rows)
-
+ERROR:  failed to inject fault: ERROR:  fault not triggered, fault name:'after_drop_database_directories' fault type:'wait_until_triggered' (gp_inject_fault.c:127)
+DETAIL:  DETAIL:  Timed-out as 10 minutes max wait happens until triggered.
```

5feccaae

28 10月, 2019 1 次提交

Fix various issues in Gang management (#8893) · 797065c5

由 Paul Guo 提交于 10月 28, 2019

1. Do not call elog(FATAL) in cleanupQE() since it could be called in
cdbdisp_destroyDispatcherState() to destroy CdbDispatcherState. This leads to
reentrance of cdbdisp_destroyDispatcherState() which is not supported. Changing
the code to return false instead and to sanity check the reentrance. Returning
false should be ok since that leads to gang destroying and thus QE resources
should be destroyed themselves. Here is a typical stack of reentrance.

0x0000000000b8ffeb in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:345
0x0000000000b90385 in cleanup_dispatcher_handle (h=0x2eff0d8) at cdbdisp.c:488
0x0000000000b904c0 in cdbdisp_cleanupDispatcherHandle (owner=0x2e80de0) at cdbdisp.c:555
0x0000000000b27fb7 in CdbResourceOwnerWalker (owner=0x2e80de0, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1375
0x0000000000b27fd8 in CdbResourceOwnerWalker (owner=0x2f30358, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1379
0x0000000000b903d9 in AtAbort_DispatcherState () at cdbdisp.c:511
0x000000000053b8ab in AbortTransaction () at xact.c:3319
0x000000000053e057 in AbortOutOfAnyTransaction () at xact.c:5248
0x00000000005c6869 in RemoveTempRelationsCallback (code=1, arg=0) at namespace.c:4088
0x000000000093c193 in shmem_exit (code=1) at ipc.c:257
0x000000000093c088 in proc_exit_prepare (code=1) at ipc.c:214
0x000000000093bf86 in proc_exit (code=1) at ipc.c:104
0x0000000000adb6e2 in errfinish (dummy=0) at elog.c:754
0x0000000000ade465 in elog_finish (elevel=21, fmt=0xe847c0 "cleanup called when a segworker is still busy") at elog.c:1735
0x0000000000beca81 in cleanupQE (segdbDesc=0x2ee9048) at cdbutil.c:846
0x0000000000becbc8 in cdbcomponent_recycleIdleQE (segdbDesc=0x2ee9048, forceDestroy=0 '\000') at cdbutil.c:871
0x0000000000b9815a in RecycleGang (gp=0x2eff7f0, forceDestroy=0 '\000') at cdbgang.c:861
0x0000000000b9009e in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:372
0x0000000000b96957 in CdbDispatchCopyStart (cdbCopy=0x2f23828, stmt=0x2e364d0, flags=5) at cdbdisp_query.c:1442

2. Force to drop the reader gang for named portal if set command happens
previously since that setting was not dispatched to that gang and thus we
should not reuse them.

3. Now that we have the mechanism of destroying DispatcherState in resource
owner callback when aborting transaction. It is not needed to destroy in some
dispatcher code.

The added test cases and some existing test cases cover almost all code change
except the change in cdbdisp_dispatchX() (I can not find a solution to test
this, and I'll keep it in my mind to see how to test that or similar code).

Reviewed-by: Pengzhou Tang
Reviewed-by: Asim R P

797065c5

23 10月, 2019 1 次提交

Add append optimized support on partition table (#8880) · acbfbf8b

由 Jinbao Chen 提交于 10月 23, 2019

The commit b3b2797e adds an alias
appendoptimized for appendonly. Making changes on the partition table
was missed. Add it back.

acbfbf8b

09 10月, 2019 1 次提交

Rewrite the way SubPlans are "parallelized". · 00e25afe

由 Heikki Linnakangas 提交于 10月 09, 2019

The old way was to construct a plan for a correlated subquery was to plan
the subquery as usual, except that Index Scans were not allowed. Then,
after constructing a Plan tree, the post-processing step apply_motion()
added Redistribute motion nodes so that the plan tree was executable on
any node. That was pretty simplistic; disabling Index Scans completely
obviously hurts performance. Also, because the planner didn't take into
account that all base relations are actually redistributed everywhere,
it was not reflected in the cost estimates, and the planner might choose
a sub-optimal plan.

To improve that, change the way that works, so that the Motions are added
earlier in the planning. The planner needs to know which restrictions
(WHERE clauses) refer to the outer query, i.e. which quals are correlated,
and make sure that those quals are always evaluated in the same slice as
the parent query. The outer query cannot pass parameters down through a
Motion node, so the subquery plan must not contain any Motion nodes
between the evaluation of the correlated variable, and the outer plan.
This is enforced by having a new CdbPathLocus type, "OuterQuery". A node
with OuterQuery locus must be evaluated in the outer query. It is mostly
the same as "general", which means that it can be evaluated anywhere, but
with the restriction that it is not OK to redistribute an input that has
OuterQuery locus. Whenever the planner node evaluates a correlated var,
that node must have Upper locus, by adding Motions below that node. This
has similar effect as the old approach, but gives the planner a bit more
flexibility, and the motions are taken into account in cost estimates.

This allows using Index Scans in subplans, but only if the Index Quals
don't contain correlated vars. This still isn't perfect, it would
sometimes be good to for example delay the evaluation of a correlated var
later, above a join node, because that might avoid expensive Redistribute
Motions. Even though this patch doesn't allow such plans yet, it's a step
in the right direction.

This moves the cdbllize() step to run *before* set_plan_references().
Now that we no longer add Motion nodes to an already-constructed plan
tree, we don't need the 'useExecutorVarFormat' stuff in many functions
anymore.

Fixes https://github.com/greenplum-db/gpdb/issues/8648Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
Reviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>

00e25afe

04 10月, 2019 1 次提交
- A
  Remove redundant lines from partition_ddl test · 6958b7e8
  由 Asim R P 提交于 10月 04, 2019
```
Spotted when analyzing a CI failure pertaining to another test.

Reviewed by Heikki and Georgios.
```
  6958b7e8
02 10月, 2019 2 次提交

B

Applied Feedback · 055eba59
由 Bhuvnesh Chaudhary 提交于 10月 01, 2019

055eba59

Allow COPY FROM to catalog tables only with allow_system_table_mods=on · aa310911

由 Bhuvnesh Chaudhary 提交于 9月 30, 2019

Earlier, COPY <Catalogtable> from <file> was allowed irrespective of the
value of allow_system_table_mods. This commit restricts such statements
only when allow_system_table_mods is set to ON.

Co-Authored-By: Ashwin Agrawal<aagrawal@pivotal.io>

aa310911

26 9月, 2019 1 次提交

Fix crash in COPY FROM for non-distributed/non-replicated table · 6793882b

由 Ashwin Agrawal 提交于 9月 25, 2019

Current code for COPY FROM picks mode as COPY_DISPATCH for
non-distributed/non-replicated table as well. This causes crash. It
should be using COPY_DIRECT, which is normal/direct mode to be used
for such tables.

The crash was exposed by following SQL commands:

CREATE TABLE public.heap01 (a int, b int) distributed by (a);
INSERT INTO public.heap01 VALUES (generate_series(0,99), generate_series(0,98));
ANALYZE public.heap01;

COPY (select * from pg_statistic where starelid = 'public.heap01'::regclass) TO '/tmp/heap01.stat';
DELETE FROM pg_statistic where starelid = 'public.heap01'::regclass;
COPY pg_statistic from '/tmp/heap01.stat';

Important note: Yes, it's known and strongly recommended to not touch
the `pg_statistics` or any other catalog table this way. But it's no
good to panic either. The copy to `pg_statictics` is going to ERROR
out "correctly" and not crash after this change with `cannot accept a
value of type anyarray`, as there just isn't any way at the SQL level
to insert data into pg_statistic's anyarray columns. Refer:
https://www.postgresql.org/message-id/12138.1277130186%40sss.pgh.pa.us

6793882b