1. 24 6月, 2020 1 次提交
  2. 22 6月, 2020 1 次提交
    • (
      Fix flaky appendonly test. · f860ff0c
      (Jerome)Junfeng Yang 提交于
      This fix the error:
      ```
      ---
      /tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/appendonly.out
      2020-06-16 08:30:46.484398384 +0000
      +++ /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/appendonly.out
      2020-06-16 08:30:46.556404454 +0000
      @@ -709,8 +709,8 @@
         SELECT oid FROM pg_class WHERE relname='tenk_ao2'));
             case    | objmod | last_sequence | gp_segment_id
              -----------+--------+---------------+---------------
            + NormalXid |      0 | 1-2900        |             1
              NormalXid |      0 | >= 3300       |             0
            - NormalXid |      0 | >= 3300       |             1
              NormalXid |      0 | >= 3300       |             2
              NormalXid |      1 | zero          |             0
              NormalXid |      1 | zero          |             1
      ```
      
      The flaky is because of the orca `CREATE TABLE` statement without
      `DISTRIBUTED BY` will treat the table as randomly distributed.
      But the planner will treat as distributed by the table's first column.
      
      ORCA:
      ```
      CREATE TABLE tenk_ao2 with(appendonly=true, compresslevel=0,
      blocksize=262144) AS SELECT * FROM tenk_heap;
      NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL
      policy entry.
      ```
      
      Planner:
      ```
      CREATE TABLE tenk_ao2 with(appendonly=true, compresslevel=0,
      blocksize=262144) AS SELECT * FROM tenk_heap;
      NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s)
      named 'unique1' as the Greenplum Database data distribution key for this
      table.
      ```
      
      So the data distribution for table tenk_ao2 is not as expected.
      Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
      f860ff0c
  3. 08 6月, 2020 1 次提交
    • P
      Retire guc gp_session_role (#9396) · f6297b96
      Paul Guo 提交于
      Use guc gp_role only now and replace the functionality of guc gp_session_role with it
      also. Previously we have both gucs. The difference of the two gucs are (copied
      from code comment):
      
       * gp_session_role
       *
       * - does not affect the operation of the backend, and
       * - does not change during the lifetime of PostgreSQL session.
       *
       * gp_role
       *
       * - determines the operating role of the backend, and
       * - may be changed by a superuser via the SET command.
      
      This is not friendly for coding. For example, You could find Gp_role and
      Gp_session_role are set as GP_ROLE_DISPATCH on Postmaster & many aux processes
      on all nodes (even QE nodes) in a cluster, so you can see that to differ from
      QD postmaster and QE postmaster, current gpdb uses an additional -E option in
      postmaster arguments. These makes developers confusing when writing role branch
      related code given we have three related variables.  Also some related code is
      even buggy now (e.g. 'set gp_role' even FATAL quits).
      
      With this patch we just have gp_role now. Some changes which might be
      interesting in the patch are:
      
      1. For postmaster, we should specify '-c gp_role=' (e.g. via pg_ctl argument) to
         determine the role else we assume the utility role.
      
      2. For stand-alone backend, utility role is enforced (no need to specify by
         users).
      
      3. Could still connect QE/QD nodes using utility mode with PGOPTIONS, etc as
         before.
      
      4. Remove the '-E' gpdb hacking and align the '-E' usage with upstream.
      
      5. Move pm_launch_walreceiver out of the fts related shmem given the later is
         not used on QE.
      Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
      Reviewed-by: NGang Xiong <gxiong@pivotal.io>
      Reviewed-by: NHao Wu <gfphoenix78@gmail.com>
      Reviewed-by: NYandong Yao <yyao@pivotal.io>
      f6297b96
  4. 03 6月, 2020 1 次提交
    • A
      Allow CLUSTER on append-optimized tables · 179feb77
      Andrey Borodin 提交于
      Cluster on AO tables is implemented by sorting the entire AO table using
      tuple sort framework, according to a btree index defined on the table.
      
      A faster way to cluster is to scan the tuples in index-order, but this
      requires index-scan support.  Append-optimized tables do not support
      index-scans currently, but when this support is added, the cluster
      operation can be enhanced accordingly.
      
      Author: Andrey Borodin <amborodin@acm.org>
      Reviewed and slightly edited by: Asim R P <pasim@vmare.com>
      
      Merges GitHub PR #9996
      179feb77
  5. 31 5月, 2020 2 次提交
    • H
      Allow AOCO tables to inherit a parent table. · 177abc76
      Heikki Linnakangas 提交于
      Because why not? This was forbidden back in 2012, with an old JIRA ticket
      that said:
      
          MPP-15735 - Inheritance is not supported with column oriented table
      
          This shouldn't be possible:
      
          Create table parent_tb2 (a1 int, a2 char(5), a3 text, a4 timestamp,
                 a5 date, column a1 ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768))
            with (appendonly=true,orientation=column)
            distributed by (a1);
      
          Create table child_tb4 (c1 int, c2 char) inherits(parent_tb2)
            with (appendonly = true, orientation = column);
      
          The reason is, dealing with column compression and inheritance is tricky
          and inheritance shouldn't even be supported.
      
      That explanation didn't go into any details, but I can see the trickiness.
      There are many places you can specify column compression options even
      without inheritance: in gp_default_storage_options GUC, in an ENCODING
      directive in CREATE TABLE command, in a "COLUMN foo ENCODING ..."
      directive in the CREATE TABLE command, or in the datatype's default
      storage options. Inheritance adds another dimension to it: should the
      current gp_default_storage_options override the options inherited from the
      parent? What if there are multiple parents with conflicting options?
      
      The easiest solution is to never inherit the column encoding options from
      the parent. That's a bit lame, but there's some precedence for that with
      partitions: if you ALTER TABLE ADD PARTITION, the encoding options are
      also not copied from the parent. So that's what this patch does.
      
      One interesting corner case is to specify the options for an inherited
      column with "CREATE TABLE (COLUMN parentcol ENCODING ...) INHERITS (...)".
      Thanks to the previous refactoring, that works, even though 'parentcol'
      is not explicitly listed in the CREATE TABLE command but inherited from
      the parent. To make dump & restore of that work correctly, modify pg_dump
      so that it always uses the "COLUMN foo ENCODING ..." syntax to specify
      the options, instead of just tacking "ENCODING ..." after the column
      definition.
      
      One excuse for doing this right now is that even though we had forbidden
      "CREATE TABLE <tab> INHERITS (<parent>)" on AOCO tables, we missed "ALTER
      TABLE <tab> INHERIT <parent>". That was still allowed, and if you did
      that you got an inherited AOCO table that worked just fine, except that
      when it was dumped with pg_dump, the dump was unrestorable. It would have
      been trivial to add a check to forbid "ALTER TABLE INHERIT" on an AOCO
      table, instead, but it's even better to allow it.
      
      Fixes https://github.com/greenplum-db/gpdb/issues/10111Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
      177abc76
    • H
      Refactor the way ENCODING clauses are parsed. · 391b78eb
      Heikki Linnakangas 提交于
      Instead of parsing them in the parse analysis phase, delay it until
      DefineRelation(), after merging the definitions of inherited attributes.
      We don't support INHERIT clause on an AOCO table today, but if we lift
      that limitation (as I'm planning to do in a follow-up commit), we need to
      be able to apply "COLUMN <col> ENCODING ..." clauses to inherited columns
      too.
      
      Before this, all callers of create_ctas_internal() had to also call
      AddDefaultRelationAttributeOptions(), in case it was an AOCO table. Now
      that DefineRelation() handles the default storage options, that's no
      longer needed.
      Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
      391b78eb
  6. 22 5月, 2020 1 次提交
  7. 19 5月, 2020 1 次提交
    • A
      Remove the pg_exttable catalog · e4b499aa
      Adam Lee 提交于
      Keep the syntax of external tables, but store as foreign tables with
      options into the catalog.
      
      While using it, transform the foreign table options to an ExtTableEntry,
      which is compatible with external_table_shim, PXF and other custom
      protocols.
      
      Also, add `DISTRIBUTED` syntax support for `CREATE FOREIGN TABLE` if
      the foreign server indicates it's an external table.
      
      Note:
      1, all permissions handling from pg_authid are still, to keep the
      external tables' GRANT queries, will do that in another PR if possible.
      2, multiple URI locations are stored in foreign table options, separated
      by `|`.
      Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
      e4b499aa
  8. 28 4月, 2020 1 次提交
  9. 26 4月, 2020 1 次提交
  10. 20 3月, 2020 1 次提交
    • (
      Enable external table's error log to be persistent for ETL. (#9757) · 04fdd0a6
      (Jerome)Junfeng Yang 提交于
      For ETL user scenarios, there are some cases that may frequently create
      and drop the same external table. And once the external table gets dropped.
      All errors stored in the error log will lose.
      
      To enable error log persistent for external with the same
      "dbname"."namespace"."table".
      Bring in "error_log_persistent" external table option. If create the
      external table with `OPTIONS (error_log_persistent 'true')` and `LOG ERROR`,
      the external's error log will be name as "dbid_namespaceid_tablename"
      under "errlogpersistent" directory.
      And drop external table will ignore to delete the error log.
      
      Since GPDB 5, 6 still use pg_exttable's options to mark
      LOG ERRORS PERSISTENTLY, so keep the ability for loading from
      OPTIONS(error_log_persistent 'true').
      
      Create separate `gp_read_persistent_error_log` function to read
      persistent error log.
      If the external table gets deleted, only the namespace owner
      has permission to delete the error log.
      
      Create separate `gp_truncate_persistent_error_log` function to delete
      persistent error log.
      If the external table gets deleted. Only the namespace owner has
      permission to delete the error log.
      It also supports wildcard input to delete error logs
      belong to a database or whole cluster.
      
      If drop an external table create with `error_log_persistent`. And then
      create the same "dbname"."namespace"."table" external table without
      persistent error log. It'll write errors to the normal error log.
      The persistent error log still exists.
      Reviewed-by: NHaozhouWang <hawang@pivotal.io>
      Reviewed-by: NAdam Lee <ali@pivotal.io>
      04fdd0a6
  11. 17 3月, 2020 1 次提交
    • H
      Simplify the implementation of "dedup" semi-join paths. · 9628a332
      Heikki Linnakangas 提交于
      There are two parts to this commit:
      
      1. Instead of using ctids and "fake" ctids to disambiguate rows that have
         been duplicated by a Broadcast motion, generate a synthetic 64-bit
         rowid specifically for that purpose, just below the Broadcast. The old
         method had to generate subplan IDs, fake ctids etc. to have a
         combination of columns that formed the unique ID, and that entailed a
         lot of infrastructure, like the whole concept of "pseudo columns". The
         new method is much simpler, and it works the same way regardless of
         what's below the Broadcast. The new 64-bit rowids are generated by
         a new RowIdExpr expression node. The high 16 bits of the rowid are
         the segment ID, and the low bits are generated by a simple counter.
      
      2. Replace the cdbpath_dedup_fixup() post-processing step on the Plan
         tree, by adding the unique ID column (now a RowIdExpr) in the Path's
         pathtarget in the first place. cdbpath_motion_for_join() is the
         function that does that now. Seems like a logical place, since that's
         where any needed Motions are added on top of the join inputs. The
         responsibility for creating the Unique path is moved from
         cdb_add_join_path() to the create_nestloop/hashjoin/mergejoin_path()
         functions.
      
      This slightly changes the plan when one side of the join has Segment or
      SegmentGeneral locus. For example:
      
      select * from srf() r(a) where r.a in (select t.a/10 from tab t);
      
      With this commit, that's implemented by running the FunctionScan and
      generating the row IDs on only one node, and redistributing/broadcasting
      the result. This is not exactly the same as before this PR: previously, if
      the function was stable or immutable, we assumed that a SRF returns rows
      in the same order on all nodes and relied on that when generating the
      "synthetic ctid" on each row. We no longer assume that, but force the
      function to run on a single node instead. That seems acceptable, this kind
      of plans are helpful when the other side of the join is much larger, so
      redistributing the smaller side is probably still OK.
      
      This also uses the same strategy when the inner side is a replicated table
      rather than a SRF. That was not supported before, but there was a TODO
      comment that hinted at the possibility.
      
      Fixes https://github.com/greenplum-db/gpdb/issues/9741, and adds tests
      for it.
      Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
      9628a332
  12. 09 3月, 2020 1 次提交
    • H
      Manage AO segments in QE nodes, instead of the master. · 5778f311
      Heikki Linnakangas 提交于
      Overview
      --------
      
      Manage the allocation of AO segfiles in QE nodes, instead of the master.
      That simplifies a lot of things. We no longer need to send the decision
      on which segfile to use from QD to QE, we no longer need to keep up-to-date
      tuple counts in the master, and so forth.
      
      This gets rid of the in-memory hash table to track AO segment status. To
      lock a segfile for insertion or other operations, we now rely on tuple
      locks on the AO segment table instead.
      
      VACUUM code has been refactored. I tried to make the code structure closer
      to upstream, to reduce merge conflicts as we merge with upstream. The way
      the VACUUM phases are dispatched, and the phases needed, are different in
      the new regime. Some phases could use local transactions, but others have
      to still use distributed transactions, but for simplicity we use
      distributed transactions for all of the phases.
      
      Now that QE nodes choose the insertion target segfile independently, we
      could use the same algorithm for choosing the target segfile in utility
      mode, and wouldn't need to reserve segfile #0 for special operations
      anymore. But to keep things familiar, and to avoid any more churn in the
      regression tests, this keeps the historical behavior and still reserves
      segfile #0 for utility mode and CREATE TABLE AS.
      
      Previously, SERIALIZABLE (and REPATABLE READ) transactions were treated
      differently from transactions in READ COMMITTED mode. If there was an
      old serializable transaction running, VACUUM refrained from recycling
      old AO segfiles after compaction, because the serializable transaction
      might lock the table later and try to read the rows that were compacted
      away. But that was not completely waterproof; a READ COMMITTED transaction
      that's still holding onto an old snapshot is no different from a
      serializable transaction. There's a gap between acquiring a snapshot and
      locking a table, so even if a READ COMMITTED backend is not holding a lock
      on a table right now, it might open it later. To fix that, remove the
      distinction between serializable and other transactions. Any open
      transaction will now prevent AO segfiles in awaiting-drop state from being
      removed. This commit adds an isolation2 test case called
      'vacuum_self_function' to test one such case; it used to fail before this
      commit.
      
      That is a user-visible change in behavior that's worth calling out
      explicitly. If there are concurrent transactions running, even in
      READ COMMITTED mode, and even if they don't access the table at all,
      VACUUM will no longer be able to recycle compacted segfiles. Next VACUUM
      will clean them up, after the old concurrent transactions. Meanwhile,
      the awaiting-drop segfiles will use up disk space, and increase the risk
      that an insertion fails because all segfiles are in use.
      
      This work was originally submitted github PR #790; you might see references
      to that in the discussions.
      
      Code change details
      -------------------
      
      This gets rid of the in-memory hash table to track AO segment status. To
      lock a segfile for insertion or other operations, we now rely on tuple
      locks on the AO segment table instead. When an AO segment is chosen as
      insertion target, its AO segment table row is locked for the transaction
      (by heap_lock_tuple, i.e. by stamping its XMAX). Later, when the insertion
      finishes, the tuple is updated with the new EOF (no change there). When
      choosing a new insertion target segfile, we scan the pg_aoseg table instead
      of the hash table, and use the xmin/xmax to determine which ones are safe
      to use. The process of choosing a target segment itself is not
      concurrency-safe, so the relation extension lock is held to protect that.
      See new "Locking and snapshots" section in
      src/backend/access/appendonly/README.md for details.
      
      tupcount and modcount are not kept up-to-date in the master. Remove all the
      code needed to send the totals back from QE to QD at end of a command,
      including the GPDB-specific 'o' protocol FE/BE message.
      
      Remove the machinery to track which PROCs are in a serializable transaction.
      It was not completely waterproof anyway, and there were some existing bugs
      in that area. A READ COMMITTED transaction that's still holding onto an old
      snapshot is no different from a serializable transaction. This commit adds
      an isolation2 test case called 'vacuum_self_function' to test one such case;
      it used to fail before this commit.
      
      Move responsibility of creating AO segfile to callers of
      appendonly_insert_init(). The caller must have locked, and created if
      necessary, the segfile by calling ChooseSegnoForWrite() or
      LockSegnoForWrite(). Previously, the pg_aoseg was not locked in rewriting
      ALTER TABLE and other such commands that always used segfile #0. That was
      OK, because they held an AccessExclusiveLock on the table, but let's be
      consistent.
      
      When selecting the insertion target for VACUUM compaction, prefer creating
      a new segfile over using an existing non-empty segfile. Existing empty
      segfiles are still choice #1, but let's avoid writing to non-empty
      existing files, because otherwise we cannot vacuum those segfiles in the
      same VACUUM cycle.
      
      Warn, if compaction fails because no insertion segfile could be chosen.
      This used to be an ERROR in the old code, but I think a WARNING is more
      appropriate.
      
      In targetid_get_partition(), don't create duplicate ResultRelInfo entries.
      It's possible that the "result relation" is a partition of a partitioned
      table, rather than the root. Don't create a new ResultRelInfo hash entry
      for it in that case, use the parent ResultRelInfo. This was harmless
      before, but now the logic of when we need to increment the modcount was
      getting confused. Per the 'qp_dropped_cols' regression test.
      
      Test change details
      -------------------
      
      src/test/regress/greenplum_schedule
      
          Run uao[cs]_catalog_tables tests separately.
      
          Because we are now more lazy with dropping AWAITING_DROP segments, the
          issue mentioned in the comment with serializable transactions now
          applies to all transactions.
      
      uao_compaction/threshold
      uaocs_compaction/threshold
      
          Adjust uao[cs]_compaction/threshold tests case to make it pass.
      
          In the first few changed outputs, I'm not entirely sure why one segment
          chooses to insert the new tuples to segfile #2 while others pick
          segfile #1, but there's also nothing wrong with that. The point of this
          PR is that QE nodes can make the decision independently.
      
          In the last test with gp_toolkit.__gp_aovisimap_compaction_info(),
          all the test data is inserted to one QE node, so segfile 2 isn't
          allocated on others.
      
      isolation2/vacuum_self_function
      
          Add test case for the non-SERIALIZABLE case discussed earlier.
      
          See also discussion at
          https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/49MW3tBoH8s/W_bdyxZMAgAJ
      
          This test errored out before this commit:
      
          ERROR:  read beyond eof in table "ao" file "base/16384/16387.1", read position 0 (small offset 0), actual read length 0 (large read length 192) (cdbbufferedread.c:211)
      
          Works with this PR, but we don't drop old AO segments as aggressively
          in vacuum anymore. Partly because we're now more careful and correct.
          But also because we don't detect the case that the VACUUM is the oldest
          transaction in the system, and nothing else is running; we miss the
          opportunity to drop compacted AO segments immediately in that case. The
          next VACUUM should drop them though.
      
          Furthermore, this test is written so that if we tried to perform AO
          compaction phase in a local rather than distributed transaction, it
          would fail. (I know because I tried to do it that way at first.) This
          is "Problem 3" that I mentioned on that gpdb-dev thread.
      
      isolation2/fsync_ao
      
        Adjust "hit counts" in fsync_ao test.
      
        The number of times the fsyncs happen have changed. To be honest, I'm not
        sure why, nor what the principled way to get the correct value would be,
        but as long as the counts are > 0, this looks like success to me.
      
      isolation2/gdd/planner_insert_while_vacuum_drop
      
        Remove GDD test that's not relevant anymore.
      
        The test was testing a deadlock between QD and QE, involving AO vacuum.
        We no longer take an AccessExclusiveLock in the QD, so the test case
        doesn't work. I'm not sure what an equivalent deadlock might be with the
        new AO vacuum, so just remove the test.
      
      isolation2/vacuum_drop_phase_ao
      isolation2/uao/insert_should_not_use_awaiting_drop
      
          These tests don't make sense in the QE-managed regime anymore. I
          couldn't figure out what these should look like with this commit, so
          remove.
      
      isolation2/uao/insert_policy
      
          Update test case to cover a scenario where two transactions try to
          create the initial segment for relation. We had a bug in that scenario
          during development, and it wasn't covered by any existing tests.
      
      isolation2/uao/select_while_vacuum_serializable2
      
          The behavior is different now, a serializable transaction doesn't
          prevent compaction phase from happening, only the recycling of the
          segfile. Adjust expected output accordingly.
      
      isolation2/uao/compaction_utility
      
          VACUUM in utility mode can compact now.
      Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io>
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Co-authored-by: NAsim R P <apraveen@pivotal.io>
      Co-authored-by: NXin Zhang <xzhang@pivotal.io>
      Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
      Reviewed-by: NTaylor Vesely <tvesely@pivotal.io>
      
      Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/49MW3tBoH8s/W_bdyxZMAgAJ
      Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/lUwtNxaT3f0/Cg39CP8uAwAJ
      5778f311
  13. 04 3月, 2020 1 次提交
    • A
      Fix the way we check if a partition table is empty in Relcache translator · 12011396
      Abhijit Subramanya 提交于
      The relation statistics object stores a flag which indicates if a relation is
      empty or not. However this was being set based on the value of stats_empty
      flag. This flag was being set in the function `cdb_estimate_partitioned_numtuples`.
      It would be set to true if any of the child partitions had their relpages set
      to 0. But if other partitions had data in them, then the relation should not be
      considered empty. We should actually be checking for the total number of rows
      to determine if a partition table is empty or not. Also the stats_empty flag
      was not being used for anything else so this commit removes it.
      12011396
  14. 31 1月, 2020 3 次提交
    • H
      Refactor away special code for external tables in planner and executor · a55b9453
      Heikki Linnakangas 提交于
      This makes external tables less of a special case throughout the planner
      and executor. They're now mostly handled through the FDW API callbacks.
      
      * Add a FDW handler function and implement the API routines to construct
        ForeignScan paths and Plan nodes, iterate through a scan, and to insert
        tuples to the external/foreign table. The API routines just call through
        to the existing external table functions, like external_getnext() and
        external_insert().
      
      * Remove ExternalScan plan node in the executor and the ExternalScan path
        node from the planner. Use ForeignScan plan and path nodes instead, like
        for any other FDW. Move code related to external table planning to a
        new exttable_fdw_shim.c file.
      
      * The parameters previously carried in the  ExternalScan struct are now in
        a new ExternalScanInfo struct. (Or, the ExternalScan struct has been
        renamed to ExternalScanInfo, if you want to think of it that way.) It's
        not a plan node type anymore, but it still needs read/out function
        support so that the parameters can be serialized. Alternatively, the
        parameters could be carried in Lists of Values and other existing
        expression types, but a struct seems easier to handle. Perhaps the
        cleanest solution would be to use the ExtensibleNode infrastructure for
        it, but I'll leave that for another patch. As long as the external table
        FDW is in the backend anyway, it's simplest to have the out/read/copy
        funcs built-in as well.
      
      * Modify ForeignScan executor node so that it can make up "fake ctids" like
        ExternalScan did, and also add "squelching" support to it.
      
      * Remove special handling of external tables from ModifyTable node. It now
        uses the normal FDW routines for it.
      
      COPY still calls directly into the external_insert() function, because
      PostgreSQL doesn't support COPY into a foreign table until version 11. (We
      don't seem to have any tests for COPY TO/FROM an external table. TODO: add
      test.)
      a55b9453
    • H
      Change the catalog representation for external tables. · b62e0601
      Heikki Linnakangas 提交于
      External tables now use relkind='f', like all foreign tables. They have
      an entry in pg_foreign_table, as if they belonged to a special foreign
      server called "exttable_server". That foreign server gets special
      treatment in the planner and executor, so that we still plan and execute
      it the same as before.
      
      * ALTER / DROP EXTERNAL TABLE is now mapped to ALTER / DROP FOREIGN TABLE.
        There is no "OCLASS_EXTTABLE" anymore. This leaks through to the user
        in error messages, e.g:
      
          postgres=# drop external table boo;
          ERROR:  foreign table "boo" does not exist
      
        and to the command tag on success:
      
          postgres=# drop external table boo;
          DROP FOREIGN TABLE
      
      * psql \d now prints external tables as Foreign Tables.
      
      Next steps:
      * Use the foreign table API routines instead of special casing
        "exttable_server" everywhere.
      
      * Get rid of the pg_exttable table, and store the all the options in
        pg_foreign_table.ftoptions instead.
      
      * Get rid of the extra fields in pg_authid to store permissions to
        create different kinds of external tables. Store them as ACLs in
        pg_foreign_server.
      b62e0601
    • H
      Remove restriction that you cannot use ALTER TABLE on external tables. · aac5391a
      Heikki Linnakangas 提交于
      In PostgreSQL, ALTER TABLE is allowed where ALTER FOREIGN TABLE is. Allow
      external tables the same leniency.
      aac5391a
  15. 14 1月, 2020 1 次提交
    • H
      Forgot to remove WARNING from expected output. · bb1c67f7
      Heikki Linnakangas 提交于
      Commit 25a7630e silenced printing a lot of spurious "time constraints
      added on superuser rule" warnings from the log. But there was also one
      such warning memorized in the expected output, which was overlooked by
      that commit. Remove the warning.
      
      The warning was memorized in the expected output in the 9.0 merge
      (commit 5ec5c30a), but it should never have been printed. Making a role
      superuser with "ALTER ROLE <role> SUPERUSER" drops all
      pg_auth_time_constraint entries for the role, so this
      check_auth_time_constraints() call shouldn't print such a warning.
      bb1c67f7
  16. 11 1月, 2020 1 次提交
    • H
      Refactor the late "parallelization" stages of the planner. · 93abe741
      Heikki Linnakangas 提交于
      * Build a preliminary "planner slice table" at the end of planning, and
        attach it to the PlannedStmt. Executor startup turns that into the
        final executor slice table. This replaces the step where executor
        startup scanned the whole Plan tree build the slice table.
      
      * Now that the executor startup gets a pre-built planner slice table, it
        doesn't need the Flow structures for building the slice table anymore.
        Also refactor the few other remaining places in nodeMotion.c and
        nodeResult.c that accessed the Flows to use the information from the
        slice table instead. The executor no longer looks at the Flows at all,
        so we don't need to include them in the serialized plan tree anymore.
        ORCA translator doesn't need to build Flow structures anymore either.
        Instead, it now builds the planner slice table like the Postgres planner
        does.
      
      * During createplan.c processing, keep track of the current "slice", and
        attach direct dispatch and other per-slice information to the PlanSlice
        struct directly, instead of carrying it in the Flow structs. This
        renders the Flows mostly unused in the planner too, but there is still
        one thing we use the Flows for: to figure out when we need to add a
        Motion on top of a SubPlan's plan tree, to make the subplan's result
        available in the slice where the SubPlan is evaluated. There's a "sender
        slice" struct attached to a Motion during create_plan() processing to
        represent the sending slice. But the slice ID is not assigned at that
        stage yet. Motion / Slice IDs are assigned later, when the slice table
        is created.
      
      * Only set 'flow' on the topmost Plan, and in the child of a Motion.
      
      * Remove unused initplans and subplans near the end of planning, after
        set_plan_references(), but before building the planner slice table. We
        used to remove init plans and subplans a little bit earlier, before
        set_plan_references(), but there was at least one corner case involving
        inherited tables where you could have a SubPlan referring a subplan
        in the plan tree before set_plan_references(), but set_plan_references()
        transformed it away. You would end up with an unused subplan in that
        case, even though we previously removed any unused subplans. This way
        we don't need to deal with unused slices in the executor.
      
      * Rewrite the logic to account direct-dispatch cost saving in plan cost.
      Reviewed-by: NTaylor Vesely <tvesely@pivotal.io>
      93abe741
  17. 03 1月, 2020 2 次提交
    • H
      Forbid DROP PARTITION with no partition name or value. · bf8dc8f4
      Heikki Linnakangas 提交于
      Previously, this was allowed:
      
          postgres=# ALTER TABLE parttab DROP PARTITION;
          ALTER TABLE
      
      It dropped the first partition, by rank. In other words, it was a
      shorthand for:
      
          ALTER TABLE parttab DROP PARTITION FOR (RANK (1));
      
      We don't need such a shorthand. It's too much of a footgun; if a user
      wants to drop partition, he should name it properly. The documentation
      doesn't mention this syntax at all, so no docs changes required.
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      bf8dc8f4
    • A
      Fix error scenarios of alter_db_set_tablespace test · 5214fd9b
      Ashwin Agrawal 提交于
      alter_db_set_tablespace test has scenarios to inject error fault for
      content 0. Then run ALTER DATABASE SET TABLESPACE command. Once error
      is hit on content 0, the transaction is aborted. Based on when the
      transaction gets aborted, its unpredictable what point the command has
      reached for non-content 0 primaries. If non-content 0 primaries, have
      reached the point of directory copy, then only abort record for them
      will have database directory deletion record to be replayed on mirror,
      else not. The test was waiting for directory deletion fault to be
      triggered for all the content mirrors. This expectation is incorrect
      and makes test flaky based on timing.
      
      Hence, modifying the test for error scenarios to only wait for
      directory deletion for content 0. Then wait for all the mirrors to
      replay all the currently generated wal records, post which make sure
      destination directory is empty. This should eliminate the flakiness
      from the test.
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      5214fd9b
  18. 31 12月, 2019 1 次提交
    • A
      Fix flakiness of alter_db_set_tablespace test · 1959053b
      Ashwin Agrawal 提交于
      Test should make sure mirror has processed the drop database wal
      record before proceeding to perform check for destination tablespace
      directory non-existence. It skipped performing the wait for content 0,
      incase of panic after writing wal record, that's incorrect.
      
      Adding to wait for all the mirror's to process the wal record and then
      only perform the validation. This should fix the failures seen in CI
      with below diff
      
      ```
      --- /tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/alter_db_set_tablespace.out    2019-10-14 16:09:43.638372174 +0000
      +++ /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/alter_db_set_tablespace.out    2019-10-14 16:09:43.714379108 +0000
      @@ -1262,25 +1271,352 @@
       CONTEXT:  PL/Python function "stat_db_objects"
       NOTICE:  dboid dir for database alter_db does not exist on dbid = 4
       CONTEXT:  PL/Python function "stat_db_objects"
      -NOTICE:  dboid dir for database alter_db does not exist on dbid = 5
      -CONTEXT:  PL/Python function "stat_db_objects"
       NOTICE:  dboid dir for database alter_db does not exist on dbid = 6
       CONTEXT:  PL/Python function "stat_db_objects"
       NOTICE:  dboid dir for database alter_db does not exist on dbid = 7
       CONTEXT:  PL/Python function "stat_db_objects"
       NOTICE:  dboid dir for database alter_db does not exist on dbid = 8
       CONTEXT:  PL/Python function "stat_db_objects"
      - dbid | relfilenode_dboid_relative_path | size
      -------+---------------------------------+------
      -    1 |                                 |
      -    2 |                                 |
      -    3 |                                 |
      -    4 |                                 |
      -    5 |                                 |
      -    6 |                                 |
      -    7 |                                 |
      -    8 |                                 |
      -(8 rows)
      + dbid | relfilenode_dboid_relative_path |  size
      +------+---------------------------------+--------
      +    1 |                                 |
      +    2 |                                 |
      +    3 |                                 |
      +    4 |                                 |
      +    5 | 180273/112                      |  32768
      +    5 | 180273/113                      |  32768
      +    5 | 180273/12390                    |  65536
      +    5 | 180273/12390_fsm                |  98304
      
      <....choping output as very long...>
      
      +    5 | 180273/PG_VERSION               |      4
      +    5 | 180273/pg_filenode.map          |   1024
      +    6 |                                 |
      +    7 |                                 |
      +    8 |                                 |
      +(337 rows)
      
      ```
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      1959053b
  19. 25 12月, 2019 1 次提交
    • N
      icw: fix flaky alter_db_set_tablespace · 54e3af6d
      Ning Yu 提交于
      The alter_db_set_tablespace test has been flaky for a long time, one
      typical failure is like this:
      
          --- /regress/expected/alter_db_set_tablespace.out
          +++ /regress/results/alter_db_set_tablespace.out
          @@ -1204,21 +1213,348 @@
           NOTICE:  dboid dir for database alter_db does not exist on dbid = 2
           NOTICE:  dboid dir for database alter_db does not exist on dbid = 3
           NOTICE:  dboid dir for database alter_db does not exist on dbid = 4
          -NOTICE:  dboid dir for database alter_db does not exist on dbid = 5
           NOTICE:  dboid dir for database alter_db does not exist on dbid = 6
           NOTICE:  dboid dir for database alter_db does not exist on dbid = 7
           NOTICE:  dboid dir for database alter_db does not exist on dbid = 8
      
      The test disables fts probing with fault injection, however it does not
      wait for the fault to be triggered.  The other problem is that the fts
      probing was disabled after the PANIC, that might not be in time.
      
      So the problem was that we were having a scenario where we were
      injecting the fault after the fts loop was beyond the fault point and
      then when the subsequent PANIC was caused, fts was still active.
      
      By manually triggering, and then by waiting to ensure that the fault is
      hit at least once, we can guarantee that the scenario described above
      doesn't happen.
      Reviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>
      Reviewed-by: NTaylor Vesely <tvesely@pivotal.io>
      Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
      54e3af6d
  20. 23 12月, 2019 2 次提交
    • H
      Allow exclusion constraints. · cdb61585
      Heikki Linnakangas 提交于
      Exclusion constraints can be enforced if the table is replicated, or if
      the distribution key columns are part of the constraint, with the same =
      operator as in the distribution key's hash opclass. This is analogous to
      the similar restrictions on unique indexes.
      
      Exclusion constraints on partitioned tables would have similar
      restrictions, but because of the way creation of partitioned tables
      currently works, checking for those restrictions would be tricky. A
      partitioned table is only marked as partitioned after the indexes have
      been created, so we cannot check for compatibility with partitioning keys
      during index creation, like we can with distribution keys. Refactoring
      creation of partitioned tables is more than I can chew right now, so for
      now, continue to disallow exclusion constraints on partitioned tables.
      Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
      Reviewed-by: NKalen Krempely <kkrempely@pivotal.io>
      cdb61585
    • H
      Fix cross-checks between unique/primary keys and distribution keys. · a2b1050d
      Heikki Linnakangas 提交于
      We had checks that a unique/primary key index must contain all the
      distribution key columns. Otherwise, rows with duplicate index keys might
      be stored on different segments, and we would fail to enforce the
      uniqueness. But the checks didn't pay any attention to operator classes.
      If the index's operator class had a different idea of what "equal" means,
      then you might have two rows that are identical according to the index
      operator class, but different according to the distribution key hash
      operator class, and be stored on different segments.
      
      Modify the checks to also check that the operator classes use the same
      equals operator. Refactor so that we can share the code doing the checks
      for both CREATE INDEX and ALTER TABLE SET DISTRIBUTED BY.
      
      Fixes https://github.com/greenplum-db/gpdb/issues/6971
      a2b1050d
  21. 04 12月, 2019 2 次提交
  22. 28 11月, 2019 1 次提交
    • H
      Only parse enough of each COPY row, to compute which segment to send it to. · a8aa1c4a
      Heikki Linnakangas 提交于
      In the common case that the distribution key is one of the first columns
      in a table, COPY FROM can scale a lot better, if we only parse, and call
      the input functions, for the first columns in the QD, and delay parsing
      the rest of the into the QEs. The overall effort is the same, but the QD
      can easily become a bottleneck.
      
      GPDB 5 did this, but we lost in the refactoring of the COPY code in the
      9.1 merge. This brings it back.
      
      Update answer file for test sreh. Master would reject tuple "bad|8|8"
      and then second tuple it will see bad will be "eleven". Hence, reject
      that. I think previously since master was parsing full tuples it will
      see incorrect data for non-distribution columns also. Hence, reported
      error for earlier tuple.
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
      a8aa1c4a
  23. 26 11月, 2019 1 次提交
  24. 25 11月, 2019 1 次提交
  25. 08 11月, 2019 1 次提交
  26. 30 10月, 2019 1 次提交
  27. 29 10月, 2019 1 次提交
    • S
      Elevate force_mirrors_to_catch_up() to gp_inject_fault · 5feccaae
      Soumyadeep Chakraborty 提交于
      This also applies the fix in efd76c4c to all callers of
      force_mirrors_to_catch_up(), thus eliminating flakes such as:
      
      ```
      -- Ensure that the mirrors including the standby master have removed the dboid dir under the target tablespace.
       SELECT gp_wait_until_triggered_fault('after_drop_database_directories', 1, dbid) FROM gp_segment_configuration WHERE role='m';
      - gp_wait_until_triggered_fault
      --------------------------------
      - Success:
      - Success:
      - Success:
      - Success:
      -(4 rows)
      -
      +ERROR:  failed to inject fault: ERROR:  fault not triggered, fault name:'after_drop_database_directories' fault type:'wait_until_triggered' (gp_inject_fault.c:127)
      +DETAIL:  DETAIL:  Timed-out as 10 minutes max wait happens until triggered.
      ```
      5feccaae
  28. 28 10月, 2019 1 次提交
    • P
      Fix various issues in Gang management (#8893) · 797065c5
      Paul Guo 提交于
      1. Do not call elog(FATAL) in cleanupQE() since it could be called in
      cdbdisp_destroyDispatcherState() to destroy CdbDispatcherState.  This leads to
      reentrance of cdbdisp_destroyDispatcherState() which is not supported. Changing
      the code to return false instead and to sanity check the reentrance. Returning
      false should be ok since that leads to gang destroying and thus QE resources
      should be destroyed themselves. Here is a typical stack of reentrance.
      
      0x0000000000b8ffeb in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:345
      0x0000000000b90385 in cleanup_dispatcher_handle (h=0x2eff0d8) at cdbdisp.c:488
      0x0000000000b904c0 in cdbdisp_cleanupDispatcherHandle (owner=0x2e80de0) at cdbdisp.c:555
      0x0000000000b27fb7 in CdbResourceOwnerWalker (owner=0x2e80de0, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1375
      0x0000000000b27fd8 in CdbResourceOwnerWalker (owner=0x2f30358, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1379
      0x0000000000b903d9 in AtAbort_DispatcherState () at cdbdisp.c:511
      0x000000000053b8ab in AbortTransaction () at xact.c:3319
      0x000000000053e057 in AbortOutOfAnyTransaction () at xact.c:5248
      0x00000000005c6869 in RemoveTempRelationsCallback (code=1, arg=0) at namespace.c:4088
      0x000000000093c193 in shmem_exit (code=1) at ipc.c:257
      0x000000000093c088 in proc_exit_prepare (code=1) at ipc.c:214
      0x000000000093bf86 in proc_exit (code=1) at ipc.c:104
      0x0000000000adb6e2 in errfinish (dummy=0) at elog.c:754
      0x0000000000ade465 in elog_finish (elevel=21, fmt=0xe847c0 "cleanup called when a segworker is still busy") at elog.c:1735
      0x0000000000beca81 in cleanupQE (segdbDesc=0x2ee9048) at cdbutil.c:846
      0x0000000000becbc8 in cdbcomponent_recycleIdleQE (segdbDesc=0x2ee9048, forceDestroy=0 '\000') at cdbutil.c:871
      0x0000000000b9815a in RecycleGang (gp=0x2eff7f0, forceDestroy=0 '\000') at cdbgang.c:861
      0x0000000000b9009e in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:372
      0x0000000000b96957 in CdbDispatchCopyStart (cdbCopy=0x2f23828, stmt=0x2e364d0, flags=5) at cdbdisp_query.c:1442
      
      2. Force to drop the reader gang for named portal if set command happens
      previously since that setting was not dispatched to that gang and thus we
      should not reuse them.
      
      3. Now that we have the mechanism of destroying DispatcherState in resource
      owner callback when aborting transaction. It is not needed to destroy in some
      dispatcher code.
      
      The added test cases and some existing test cases cover almost all code change
      except the change in cdbdisp_dispatchX() (I can not find a solution to test
      this, and I'll keep it in my mind to see how to test that or similar code).
      
      Reviewed-by: Pengzhou Tang
      Reviewed-by: Asim R P
      797065c5
  29. 23 10月, 2019 1 次提交
  30. 09 10月, 2019 1 次提交
    • H
      Rewrite the way SubPlans are "parallelized". · 00e25afe
      Heikki Linnakangas 提交于
      The old way was to construct a plan for a correlated subquery was to plan
      the subquery as usual, except that Index Scans were not allowed. Then,
      after constructing a Plan tree, the post-processing step apply_motion()
      added Redistribute motion nodes so that the plan tree was executable on
      any node. That was pretty simplistic; disabling Index Scans completely
      obviously hurts performance. Also, because the planner didn't take into
      account that all base relations are actually redistributed everywhere,
      it was not reflected in the cost estimates, and the planner might choose
      a sub-optimal plan.
      
      To improve that, change the way that works, so that the Motions are added
      earlier in the planning. The planner needs to know which restrictions
      (WHERE clauses) refer to the outer query, i.e. which quals are correlated,
      and make sure that those quals are always evaluated in the same slice as
      the parent query. The outer query cannot pass parameters down through a
      Motion node, so the subquery plan must not contain any Motion nodes
      between the evaluation of the correlated variable, and the outer plan.
      This is enforced by having a new CdbPathLocus type, "OuterQuery". A node
      with OuterQuery locus must be evaluated in the outer query. It is mostly
      the same as "general", which means that it can be evaluated anywhere, but
      with the restriction that it is not OK to redistribute an input that has
      OuterQuery locus. Whenever the planner node evaluates a correlated var,
      that node must have Upper locus, by adding Motions below that node. This
      has similar effect as the old approach, but gives the planner a bit more
      flexibility, and the motions are taken into account in cost estimates.
      
      This allows using Index Scans in subplans, but only if the Index Quals
      don't contain correlated vars. This still isn't perfect, it would
      sometimes be good to for example delay the evaluation of a correlated var
      later, above a join node, because that might avoid expensive Redistribute
      Motions. Even though this patch doesn't allow such plans yet, it's a step
      in the right direction.
      
      This moves the cdbllize() step to run *before* set_plan_references().
      Now that we no longer add Motion nodes to an already-constructed plan
      tree, we don't need the 'useExecutorVarFormat' stuff in many functions
      anymore.
      
      Fixes https://github.com/greenplum-db/gpdb/issues/8648Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
      Reviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>
      00e25afe
  31. 04 10月, 2019 1 次提交
  32. 02 10月, 2019 2 次提交
  33. 26 9月, 2019 1 次提交
    • A
      Fix crash in COPY FROM for non-distributed/non-replicated table · 6793882b
      Ashwin Agrawal 提交于
      Current code for COPY FROM picks mode as COPY_DISPATCH for
      non-distributed/non-replicated table as well. This causes crash. It
      should be using COPY_DIRECT, which is normal/direct mode to be used
      for such tables.
      
      The crash was exposed by following SQL commands:
      
          CREATE TABLE public.heap01 (a int, b int) distributed by (a);
          INSERT INTO public.heap01 VALUES (generate_series(0,99), generate_series(0,98));
          ANALYZE public.heap01;
      
          COPY (select * from pg_statistic where starelid = 'public.heap01'::regclass) TO '/tmp/heap01.stat';
          DELETE FROM pg_statistic where starelid = 'public.heap01'::regclass;
          COPY pg_statistic from '/tmp/heap01.stat';
      
      Important note: Yes, it's known and strongly recommended to not touch
      the `pg_statistics` or any other catalog table this way. But it's no
      good to panic either. The copy to `pg_statictics` is going to ERROR
      out "correctly" and not crash after this change with `cannot accept a
      value of type anyarray`, as there just isn't any way at the SQL level
      to insert data into pg_statistic's anyarray columns. Refer:
      https://www.postgresql.org/message-id/12138.1277130186%40sss.pgh.pa.us
      6793882b