1. 07 12月, 2018 1 次提交
  2. 04 12月, 2018 1 次提交
  3. 03 12月, 2018 1 次提交
  4. 29 11月, 2018 7 次提交
  5. 14 11月, 2018 1 次提交
    • D
      Avoid freeing memory in error path · de09e863
      Daniel Gustafsson 提交于
      Erroring out via ereport(ERROR ..) will clean up resources allocated
      during execution, so explicitly freeing right before is not useful
      (unless the allocation is the in TopMemoryContext).  Remove pfree()
      calls for lower allocations, and reorder one to happen just after a
      conditional ereport instead to make for slightly easier debugging
      when breaking on the error.
      Reviewed-by: NJacob Champion <pchampion@pivotal.io>
      de09e863
  6. 13 11月, 2018 1 次提交
    • J
      Support 'copy (select statement) to file on segment' (#6077) · bad6cebc
      Jinbao Chen 提交于
      In ‘copy (select statement) to file’, we generate a query plan and set
      its dest receivor to copy_dest_receive. And run the dest receivor on QD.
      In 'copy (select statement) to file on segment', we modify the query plan,
      delete gather mothon, and let dest receivor run on QE.
      
      Change 'isCtas' in Query to 'parentStmtType' to be able to mark the upper
      utility statement type. Add a CopyIntoClause node to store copy
      informations. Add copyIntoClause to PlannedStmt.
      
      In postgres, we don't need to make a different query plan for the
      query in the utility stament. But in greenplum, we need to.
      So we use a field to indicate whether the query is contained in utitily
      statemnt, and the type of utitily statemnt.
      
      Actually the behavior of 'copy (select statement) to file on segment'
      is very similar to 'SELECT ... INTO ...' and 'CREATE TABLE ... AS SELECT ...'.
      We use distribution policy inherent in the query result as the final data
      distribution policy. If not, we use the first clomn in target list as the key,
      and redistribute. The only difference is that we used 'copy_dest_receiver'
      instead of 'intorel_dest_receiver'
      bad6cebc
  7. 06 11月, 2018 1 次提交
  8. 05 11月, 2018 1 次提交
  9. 29 10月, 2018 1 次提交
  10. 25 10月, 2018 1 次提交
    • T
      Unify the way to fetch/manage the number of segments (#6034) · 8eed4217
      Tang Pengzhou 提交于
      * Don't use GpIdentity.numsegments directly for the number of segments
      
      Use getgpsegmentCount() instead.
      
      * Unify the way to fetch/manage the number of segments
      
      Commit e0b06678 lets us expanding a GPDB cluster without a restart,
      the number of segments may changes during a transaction now, so we
      need to take care of the numsegments.
      
      We now have two way to get segments number, 1) from GpIdentity.numsegments
      2) from gp_segment_configuration (cdb_component_dbs) which dispatcher used
      to decide the segments range of dispatching. We did some hard work to
      update GpIdentity.numsegments correctly within e0b06678 which made the
      management of segments more complicated, now we want to use an easier way
      to do it:
      
      1. We only allow getting segments info (include number of segments) through
      gp_segment_configuration, gp_segment_configuration has newest segments info,
      there is no need to update GpIdentity.numsegments, GpIdentity.numsegments is
      left only for debugging and can be removed totally in the future.
      
      2. Each global transaction fetches/updates the newest snapshot of
      gp_segment_configuration and never change it until the end of transaction
      unless a writer gang is lost, so a global transaction can see consistent
      state of segments. We used to use gxidDispatched to do the same thing, now
      it can be removed.
      
      * Remove GpIdentity.numsegments
      
      GpIdentity.numsegments take no effect now, remove it. This commit
      does not remove gp_num_contents_in_cluster because it needs to
      modify utilities like gpstart, gpstop, gprecoverseg etc, let's
      do such cleanup work in another PR.
      
      * Exchange the default UP/DOWN value in fts cache
      
      Previously, Fts prober read gp_segment_configuration, checked the
      status of segments and then set the status of segments in the shared
      memory struct named ftsProbeInfo->fts_status[], so other components
      (mainly used by dispatcher) can detect a segment was down.
      
      All segments were initialized as down and then be updated to up in
      most common cases, this brings two problems:
      
      1. The fts_status is invalid until FTS does the first loop, so QD
      need to check ftsProbeInfo->fts_statusVersion > 0
      2. gpexpand add a new segment in gp_segment_configuration, the
      new added segment may be marked as DOWN if FTS doesn't scan it
      yet.
      
      This commit changes the default value from DOWN to UP which can
      resolve problems mentioned above.
      
      * Fts should not be used to notify backends that a gpexpand occurs
      
      As Ashwin mentioned in PR#5679, "I don't think giving FTS responsibility to
      provide new segment count is right. FTS should only be responsible for HA
      of the segments. The dispatcher should independently figure out the count
      based on catalog.gp_segment_configuration should be the only way to get
      the segment count", FTS should decouple from gpexpand.
      
      * Access gp_segment_configuration inside a transaction
      
      * upgrade log level from ERROR to FATAL if expand version changed
      
      * Modify gpexpand test cases according to new design
      8eed4217
  11. 19 10月, 2018 1 次提交
    • H
      Fix error handling in "COPY <table> TO <file>". · eab449f8
      Heikki Linnakangas 提交于
      If an error occurred in the segments, in a "COPY <table> TO <file>"
      command, the COPY was stopped, but the error was not reported to the user.
      That gave the false impression that it finished successfully, but what you
      actually got was an incomplete file.
      
      A test case is included. It uses a little helper output function that
      sometimes throws an error. Output functions are fairly unlikely to fail,
      but it could happen e.g. because of an out of memory error, or a disk
      failure. The "COPY (SELECT ...) TO <file>" variant did not suffer from
      this (otherwise, a query that throws an error would've been a much simpler
      way to test this.)
      
      The reason for this was that the code in cdbCopyGetData() that called
      PQgetResult(), and extracted the error message from the result, didn't
      indicate to the caller in any way that the error happened. To fix, delay
      the call to PQgetResult(), to a later call to cdbCopyEnd(). cdbCopyEnd()
      already had the logic to extract the error information from the PGresult,
      and throw it to the user. While we're at it, refactor cdbCopyEnd a
      little bit, to give the callers a nicer function signature.
      
      I also changed a few places that used 32-bit int to store rejected row
      counts, to use int64 instead. There was a FIXME comment about that. I
      didn't fix all the places that do that, though, so I moved the FIXME to
      one of the remaining places.
      
      Apply to master branch only. GPDB 5 didn't handle this too well, either;
      with the included test case, you got an error like this:
      
      postgres=# copy broken_type_test to '/tmp/x';
      ERROR:  missing error text
      
      That's not very nice, but at least you get an error, even if it's not a very
      good one. The code looks quite different in 5X_STABLE, so I'm not going to
      attempt improving that.
      Reviewed-by: NAdam Lee <ali@pivotal.io>
      eab449f8
  12. 28 9月, 2018 1 次提交
    • Z
      Allow tables to be distributed on a subset of segments · 4eb65a53
      ZhangJackey 提交于
      There was an assumption in gpdb that a table's data is always
      distributed on all segments, however this is not always true for example
      when a cluster is expanded from M segments to N (N > M) all the tables
      are still on M segments, to workaround the problem we used to have to
      alter all the hash distributed tables to randomly distributed to get
      correct query results, at the cost of bad performance.
      
      Now we support table data to be distributed on a subset of segments.
      
      A new columne `numsegments` is added to catalog table
      `gp_distribution_policy` to record how many segments a table's data is
      distributed on.  By doing so we could allow DMLs on M tables, joins
      between M and N tables are also supported.
      
      ```sql
      -- t1 and t2 are both distributed on (c1, c2),
      -- one on 1 segments, the other on 2 segments
      select localoid::regclass, attrnums, policytype, numsegments
          from gp_distribution_policy;
       localoid | attrnums | policytype | numsegments
      ----------+----------+------------+-------------
       t1       | {1,2}    | p          |           1
       t2       | {1,2}    | p          |           2
      (2 rows)
      
      -- t1 and t1 have exactly the same distribution policy,
      -- join locally
      explain select * from t1 a join t1 b using (c1, c2);
                         QUERY PLAN
      ------------------------------------------------
       Gather Motion 1:1  (slice1; segments: 1)
         ->  Hash Join
               Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
               ->  Seq Scan on t1 a
               ->  Hash
                     ->  Seq Scan on t1 b
       Optimizer: legacy query optimizer
      
      -- t1 and t2 are both distributed on (c1, c2),
      -- but as they have different numsegments,
      -- one has to be redistributed
      explain select * from t1 a join t2 b using (c1, c2);
                                QUERY PLAN
      ------------------------------------------------------------------
       Gather Motion 1:1  (slice2; segments: 1)
         ->  Hash Join
               Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
               ->  Seq Scan on t1 a
               ->  Hash
                     ->  Redistribute Motion 2:1  (slice1; segments: 2)
                           Hash Key: b.c1, b.c2
                           ->  Seq Scan on t2 b
       Optimizer: legacy query optimizer
      ```
      4eb65a53
  13. 06 9月, 2018 1 次提交
    • H
      Fix command tag in "COPY (SELECT ...) TO <file>". · f5130c20
      Heikki Linnakangas 提交于
      It used to always say "COPY 0", instead of the number of rows copied. This
      source line was added in PostgreSQL 9.0 (commit 8ddc05fb), but it was
      missed in the merge. Add a test case to check the command tags of
      different variants of COPY, including this one.
      f5130c20
  14. 05 9月, 2018 1 次提交
  15. 15 8月, 2018 1 次提交
  16. 14 8月, 2018 1 次提交
    • P
      Refine dispatching of COPY command · a1b6b2ae
      Pengzhou Tang 提交于
      Previously, COPY use CdbDispatchUtilityStatement directly to
      dispatch 'COPY' statements to all QEs and then send/receive
      data from primaryWriterGang, this way happens to work because
      primaryWriterGang is not recycled when a dispatcher state is
      destroyed. This seems nasty because the COPY command has finished
      logically.
      
      This commit splits the COPY dispatching logic to two parts to
      make it more reasonable.
      a1b6b2ae
  17. 03 8月, 2018 2 次提交
  18. 02 8月, 2018 1 次提交
    • R
      Merge with PostgreSQL 9.2beta2. · 4750e1b6
      Richard Guo 提交于
      This is the final batch of commits from PostgreSQL 9.2 development,
      up to the point where the REL9_2_STABLE branch was created, and 9.3
      development started on the PostgreSQL master branch.
      
      Notable upstream changes:
      
      * Index-only scan was included in the batch of upstream commits. It
        allows queries to retrieve data only from indexes, avoiding heap access.
      
      * Group commit was added to work effectively under heavy load. Previously,
        batching of commits became ineffective as the write workload increased,
        because of internal lock contention.
      
      * A new fast-path lock mechanism was added to reduce the overhead of
        taking and releasing certain types of locks which are taken and released
        very frequently but rarely conflict.
      
      * The new "parameterized path" mechanism was added. It allows inner index
        scans to use values from relations that are more than one join level up
        from the scan. This can greatly improve performance in situations where
        semantic restrictions (such as outer joins) limit the allowed join orderings.
      
      * SP-GiST (Space-Partitioned GiST) index access method was added to support
        unbalanced partitioned search structures. For suitable problems, SP-GiST can
        be faster than GiST in both index build time and search time.
      
      * Checkpoints now are performed by a dedicated background process. Formerly
        the background writer did both dirty-page writing and checkpointing. Separating
        this into two processes allows each goal to be accomplished more predictably.
      
      * Custom plan was supported for specific parameter values even when using
        prepared statements.
      
      * API for FDW was improved to provide multiple access "paths" for their tables,
        allowing more flexibility in join planning.
      
      * Security_barrier option was added for views to prevents optimizations that
        might allow view-protected data to be exposed to users.
      
      * Range data type was added to store a lower and upper bound belonging to its
        base data type.
      
      * CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
        SELECT query is planned during the execution of the utility. To conform to
        this change, GPDB executes the utility statement only on QD and dispatches
        the plan of the SELECT query to QEs.
      Co-authored-by: NAdam Lee <ali@pivotal.io>
      Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Co-authored-by: NAsim R P <apraveen@pivotal.io>
      Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
      Co-authored-by: NGang Xiong <gxiong@pivotal.io>
      Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
      Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
      Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
      Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
      Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
      Co-authored-by: NPaul Guo <paulguo@gmail.com>
      Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
      Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
      Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
      Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
      4750e1b6
  19. 11 7月, 2018 1 次提交
    • A
      Improve handling of rd_cdbpolicy. · 0bfc7251
      Ashwin Agrawal 提交于
      Pointers from Relation object needs to be handled with special care. As having
      refcount on the object doesn't mean the object is not modified. Incase of cache
      invalidation message handling Relation object gets *rebuild*. As part of rebuild
      only guarantee maintained is that Relation object address will not change. But
      the memory addresses inside the Relation object gets freed and freshly allocated
      and populated with latest data from catalog.
      
      For example below code sequence is dangerous
      
          rel->rd_cdbpolicy = original_policy;
          GpPolicyReplace(RelationGetRelid(rel), original_policy);
      
      If relcache invalidation message is served after assigning value to
      rd_cdbpolicy, the rebuild will free the memory for rd_cdbpolicy (which means
      original_policy) and replaced with current contents of
      gp_distribution_policy. So, when GpPolicyReplace() called with original_policy
      is going to access freed memory. Plus, rd_cdbpolicy will have stale value in
      cache and not intended refreshed value. This issue was hit in CI few times and
      reproduces with higher frequency with `-DRELCACHE_FORCE_RELEASE`.
      
      Hence this patch fixes all uses to rd_cdbpolicy to make use of rd_cdbpolicy
      pointer directly from Relation object and also to update the catalog first
      before assigning the value to rd_cdbpolicy.
      0bfc7251
  20. 10 7月, 2018 1 次提交
  21. 04 7月, 2018 1 次提交
  22. 27 6月, 2018 1 次提交
  23. 19 6月, 2018 2 次提交
  24. 11 6月, 2018 1 次提交
    • A
      Fix external table with non-UTF8 encoding data · 6822104f
      Adam Lee 提交于
      1, pass external table encoding to copy's options, then set
      cstate->file_encoding to it, for reading and writing.
      
      2, after the merge, copy state doesn't have a member of client encoding,
      which used to set to the target encoding, get the converted data as a
      client, now passes the file encoding (from copy options) to convert
      directly.
      6822104f
  25. 25 5月, 2018 1 次提交
    • J
      Fix an issue with COPY FROM for partition tables · 01a22423
      Jimmy Yih 提交于
      The Postgres 9.1 merge introduced a problem where issuing a COPY FROM
      to a partition table could result in an unexpected error, "ERROR:
      extra data after last expected column", even though the input file was
      correct. This would happen if the partition table had partitions where
      the relnatts were not all the same (e.g. ALTER TABLE DROP COLUMN,
      ALTER TABLE ADD COLUMN, and then ALTER TABLE EXCHANGE PARTITION). The
      internal COPY logic would always use the COPY state's relation, the
      partition root, instead of the actual partition's relation to obtain
      the relnatts value. In fact, the only reason this is intermittently
      seen is because the COPY logic, when working on the leaf partition's
      relation that has a different relnatts value, was looking beyond a
      boolean array's allocated memory and got a phony value that would
      evaluate to TRUE.
      Co-authored-by: NJimmy Yih <jyih@pivotal.io>
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      01a22423
  26. 17 5月, 2018 1 次提交
    • A
      COPY: expand the type of numcompleted to 64 bits · 8d40268b
      Adam Lee 提交于
      Integer overflow occurs without this when copied more than 2^31 rows,
      under the `COPY ON SEGMENT` mode.
      
      Errors happen when it is casted to uint64, the type of `processed` in
      `CopyStateData`, third-party Postgres driver, which takes it as an
      int64, fails out of range.
      8d40268b
  27. 08 5月, 2018 1 次提交
  28. 29 3月, 2018 1 次提交
    • P
      Support replicated table in GPDB · 7efe3204
      Pengzhou Tang 提交于
      * Support replicated table in GPDB
      
      Currently, tables are distributed across all segments by hash or random in GPDB. There
      are requirements to introduce a new table type that all segments have the duplicate
      and full table data called replicated table.
      
      To implement it, we added a new distribution policy named POLICYTYPE_REPLICATED to mark
      a replicated table and added a new locus type named CdbLocusType_SegmentGeneral to specify
      the distribution of tuples of a replicated table.  CdbLocusType_SegmentGeneral implies
      data is generally available on all segments but not available on qDisp, so plan node with
      this locus type can be flexibly planned to execute on either single QE or all QEs. it is
      similar with CdbLocusType_General, the only difference is that CdbLocusType_SegmentGeneral
      node can't be executed on qDisp. To guarantee this, we try our best to add a gather motion
      on the top of a CdbLocusType_SegmentGeneral node when planing motion for join even other
      rel has bottleneck locus type, a problem is such motion may be redundant if the single QE
      is not promoted to executed on qDisp finally, so we need to detect such case and omit the
      redundant motion at the end of apply_motion(). We don't reuse CdbLocusType_Replicated since
      it's always implies a broadcast motion bellow, it's not easy to plan such node as direct
      dispatch to avoid getting duplicate data.
      
      We don't support replicated table with inherit/partition by clause now, the main problem is
      that update/delete on multiple result relations can't work correctly now, we can fix this
      later.
      
      * Allow spi_* to access replicated table on QE
      
      Previously, GPDB didn't allow QE to access non-catalog table because the
      data is incomplete,
      we can remove this limitation now if it only accesses replicated table.
      
      One problem is QE need to know if a table is replicated table,
      previously, QE didn't maintain
      the gp_distribution_policy catalog, so we need to pass policy info to QE
      for replicated table.
      
      * Change schema of gp_distribution_policy to identify replicated table
      
      Previously, we used a magic number -128 in gp_distribution_policy table
      to identify replicated table which is quite a hack, so we add a new column
      in gp_distribution_policy to identify replicated table and partitioned
      table.
      
      This commit also abandon the old way that used 1-length-NULL list and
      2-length-NULL list to identify DISTRIBUTED RANDOMLY and DISTRIBUTED
      FULLY clause.
      
      Beside, this commit refactor the code to make the decision-making of
      distribution policy more clear.
      
      * support COPY for replicated table
      
      * Disable row ctid unique path for replicated table.
        Previously, GPDB use a special Unique path on rowid to address queries
        like "x IN (subquery)", For example:
        select * from t1 where t1.c2 in (select c2 from t3), the plan looks
        like:
         ->  HashAggregate
               Group By: t1.ctid, t1.gp_segment_id
                  ->  Hash Join
                        Hash Cond: t2.c2 = t1.c2
                      ->  Seq Scan on t2
                      ->  Hash
                          ->  Seq Scan on t1
      
        Obviously, the plan is wrong if t1 is a replicated table because ctid
        + gp_segment_id can't identify a tuple, in replicated table, a logical
        row may have different ctid and gp_segment_id. So we disable such plan
        for replicated table temporarily, it's not the best way because rowid
        unique way maybe the cheapest plan than normal hash semi join, so
        we left a FIXME for later optimization.
      
      * ORCA related fix
        Reported and added by Bhuvnesh Chaudhary <bchaudhary@pivotal.io>
        Fallback to legacy query optimizer for queries over replicated table
      
      * Adapt pg_dump/gpcheckcat to replicated table
        gp_distribution_policy is no longer a master-only catalog, do
        same check as other catalogs.
      
      * Support gpexpand on replicated table && alter the dist policy of replicated table
      7efe3204
  29. 02 3月, 2018 1 次提交
  30. 01 2月, 2018 1 次提交
    • A
      Fix COPY PROGRAM issues · d6bd4ac4
      Adam Lee 提交于
      1, pipes might not exist while in close_program_pipes(), check it.
        For instance, relation doesn't exist, copy workflow fails before
        executing the program, "cstate->program_pipes->pid" dereferences NULL.
      
      2, the program might be still running or hang when copy exits, kill it.
        Cases like the program hangs, doesn't take signals, user is trying to
        cancel.
        Since it's already the end of copy, and the program was started by
        copy, it should be safe to kill to clean up.
      d6bd4ac4
  31. 30 1月, 2018 2 次提交
    • W
      Add hook to handle query info · 49b9bbc8
      Wang Hao 提交于
      The hook is called for
       - each query Submit/Start/Finish/Abort/Error
       - each plan node, on executor Init/Start/Finish
      
      Author: Wang Hao <haowang@pivotal.io>
      Author: Zhang Teng <tezhang@pivotal.io>
      49b9bbc8
    • W
      Alloc Instrumentation in Shmem · 67db4274
      Wang Hao 提交于
      On postmaster start, additional space in Shmem is allocated for Instrumentation
      slots and a header. The number of slots is controlled by a cluster level GUC,
      default is 5MB (approximate 30K slots). The default number is estimated by 250
      concurrent queries * 120 nodes per query. If the slots are exhausted,
      instruments are allocated in local memory as fallback.
      
      These slots are organized as a free list:
        - Header points to the first free slot.
        - Each free slot points to next free slot.
        - The last free slot's next pointer is NULL.
      
      ExecInitNode calls GpInstrAlloc to pick an empty slot from the free list:
        - The free slot pointed by the header is picked.
        - The picked slot's next pointer is assigned to the header.
        - A spin lock on the header to prevent concurrent writing.
        - When GUC gp_enable_query_metrics is off, Instrumentation will
          be allocated in local memory.
      
      Slots are recycled by resource owner callback function.
      
      Benchmark result with TPC-DS shows performance impact by this commit is less than 0.1%
      To improve performance of instrumenting, following optimizations are added:
        - Introduce instrument_option to skip CDB info collection
        - Optimize tuplecount in Instrumentation from double to uint64
        - Replace instrument tuple entry/exit function with macro
        - Add need_timer to Instrumentation, to allow eliminating of timing overhead.
          This is porting part of upstream commit:
      ------------------------------------------------------------------------
      commit af7914c6
      Author: Robert Haas <rhaas@postgresql.org>
      Date:   Tue Feb 7 11:23:04 2012 -0500
      
      Add TIMING option to EXPLAIN, to allow eliminating of timing overhead.
      ------------------------------------------------------------------------
      
      Author: Wang Hao <haowang@pivotal.io>
      Author: Zhang Teng <tezhang@pivotal.io>
      67db4274