1. 19 6月, 2018 5 次提交
  2. 16 6月, 2018 1 次提交
    • A
      Fix incorrect modification of storageAttributes.compress. · 7c82d50f
      Ashwin Agrawal 提交于
      For CO table, storageAttributes.compress only conveys if should apply block
      compression or not. RLE is performed as stream compression within the block and
      hence storageAttributes.compress true or false doesn't relate to rle at all. So,
      with rle_type compression storageAttributes.compress is true for compression
      levels > 1 where along with stream compression, block compression is
      performed. For compress level = 1 storageAttributes.compress is always false as
      no block compression is applied. Now since rle doesn't relate to
      storageAttributes.compress there is no reason to touch the same based on
      rle_type compression.
      
      Also, the problem manifests more due the fact in datumstream layer
      AppendOnlyStorageAttributes in DatumStreamWrite (`acc->ao_attr.compress`) is
      used to decide block type whereas in cdb storage layer functions
      AppendOnlyStorageAttributes from AppendOnlyStorageWrite
      (`idesc->ds[i]->ao_write->storageAttributes.compress`) is used. Due to this
      difference changing just one that too unnecessarily is bound to cause issue
      during insert.
      
      So, removing the unnecessary and incorrect update to
      AppendOnlyStorageAttributes.
      
      Test case showcases the failing scenario without the patch.
      7c82d50f
  3. 14 6月, 2018 1 次提交
  4. 13 6月, 2018 1 次提交
  5. 12 6月, 2018 1 次提交
    • J
      Add more files to gitignore · fe69bd9f
      Jim Doty 提交于
      When cloning a fresh copy of GPDB, running through the documented make
      process, and then running the make target for the demo cluster, there
      are three files that get generated. This commit adds those files to the
      .gitignore files in their respective directories.
      Authored-by: NJim Doty <jdoty@pivotal.io>
      fe69bd9f
  6. 11 6月, 2018 2 次提交
  7. 08 6月, 2018 1 次提交
    • N
      resgroup: load resgroup settings even for bypassed queries. · 92ffdcb7
      Ning Yu 提交于
      `SHOW memory_spill_ratio` will always display 20 when it's the first
      query in a connection (if you run this query in psql and pressed TAB
      when entering the command then the implicit queries ran by the tab
      completion function will be the first), the root cause is that SHOW
      command will be bypassed in resgroup, so the bound resgroup will not be
      assigned, and the resgroup's settings will not be loaded.
      
      To display the proper value in this case we will also load the resgroup
      settings even for bypassed queries.
      92ffdcb7
  8. 07 6月, 2018 4 次提交
    • J
      User can see cpu usage on cpuset groups (#5115) · 62a12801
      Jialun 提交于
      Cpu usage of cpuset groups should also be displayed in
      gp_toolkit.gp_resgroup_status.
      62a12801
    • P
      Fix teardown issue of TCP interconnect · 1c1f1644
      Pengzhou Tang 提交于
      Previously, for an interconnect connection, if no data are available at
      sender peer, the sender sends a customized EOS packet to the receiver
      and disables further send operations using shutdown(SHUT_WR), then
      somehow, the sender closes the connection totally with close()
      immediately and it counts on the kernel and TCP stack to guarantee the
      data been transformed to the receiver. The problem is, on some platform,
      if the connection is closed on one side, the TCP behave is undetermined,
      the packets may be lost and receiver may report an unexpected error.
      
      The correct way is sender blocks on the connection until receiver
      getting the EOS packet and close its peer, then the sender can close
      the connection safely.
      1c1f1644
    • P
      Fix hang issue due to result node don't squelch outer node explicitly · 2c011ce4
      Pengzhou Tang 提交于
      For a result node with one-time filter, if it's outer plan is not
      empty and contains a motion node, then it needs to squelch the outer
      node explicitly if the one-time filter check is false. This is necessary
      espically for motion node under it, ExecSquelchNode() force a stop
      message so the interconnect sender don't stuck at recending or polling
      ACKs.
      2c011ce4
    • P
      skip wal sender for ProcessStartupPacket fault · f48cd729
      Pengzhou Tang 提交于
      This is a quick fix to make dispatch test pass, for a long term,
      we need to redesign the dispatch test or make it a unit test.
      f48cd729
  9. 06 6月, 2018 7 次提交
    • A
      Fix coverity issue CID 186433. · c1372ce6
      Ashwin Agrawal 提交于
      c1372ce6
    • P
      Disable GDD scan for dispatch tests · fd54a398
      Pengzhou Tang 提交于
      Dispatch tests don't expect backends created by other tests or auxiliary
      processes like FTS and GDD, this commit disables GDD too to make dispatch
      tests stable.
      fd54a398
    • J
      Fix potential bugs reported by CoverityScan (#5105) · eb3d124b
      Jialun 提交于
      - Change strncpy to StrNCpy, make sure dest string be terminated
      - Initilize some variables before use it.
      eb3d124b
    • J
      Oops · c8f6891d
      Jesse Zhang 提交于
      Commit 1c1945fd9dbaf217062596062f73beac4934d7b6 broke compilation when
      we use the trivial / dummy implementation of resource group. The fix for
      that is trivial (this commit). But it begs the question: should we make
      the build system less magical (switching the implementation based on the
      platform), and instead just always exercise the dummy implementation (or
      at least the building of it).
      c8f6891d
    • A
      Efficient deletion of AO/CO files. · 8838ac98
      Ashwin Agrawal 提交于
      Previous algorithm scans entire directory to find specific relfilenode
      extensions to be deleted. This is not optimal for large directory sizes. This
      patch introduces extra logic based on the table extension pattern which helps
      to avoid directory scan.
      
      Algorithm is coded based on assumption that for CO tables a given concurrency
      level either all columns have the file or none as well as the following file
      table extension pattern:
      
        Heap Tables: contiguous extensions, no upper bound
        AO Tables: non contiguous extensions [.0 - .127]
        CO Tables: non contiguous extensions
               [  .0 - .127] for first column
               [.128 - .255] for second column
               [.256 - .283] for third column
               etc
      
        AO file format can be treated as a special case of CO tables with 1 column.
      
      High level logic:
       1) Finds for which concurrency levels the table has files. This is
          calculated based off the first column. It performs 127
          (MAX_AOREL_CONCURRENCY) unlink().
       2) Iterates over the single column and deletes all concurrency level files.
          For AO tables this will exit fast.
      
      This algorithm can be used for heap tables as well, however to prevent merge
      conflicts it currently is only used for CO/AO tables.
      Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
      8838ac98
    • A
      Pass relstorage type to smgr layer. · 85fee736
      Ashwin Agrawal 提交于
      Without this patch the strorage layout is not known in md and smgr layer. Due
      to lack of this info sub-optimal operations need to be performed generically
      for all table types. For example Heap specific functions like
      ForgetRelationFsyncRequests(), DropRelFileNodeBuffers() gets called even for AO
      and CO tables.
      
      Adding new RelFileNodeWithStorageType struct to carry pass storage type to md
      and smgr layer. XLOG_XACT_COMMIT and XLOG_XACT_ABORT wal records use the new
      structure which has RelFileNode and storage type
      Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
      85fee736
    • A
      Optimize and correct copy_append_only_data(). · b3aff72d
      Ashwin Agrawal 提交于
      Alter tablespace needs to copy all underlying files of table from one tablespace
      to other. For AO/CO tables this was implemented using full directory scan to
      find files and copy when persistent tables were removed. This gets very
      inefficient and varies in performance based on number of files present in the
      directory. Instead use the same optimization logic used for `mdunlink_ao()`
      leveraging known file layout for AO/CO tables.
      
      Also, old logic had couple of bugs:
      - missed coping the base or .0 file. Which means data loss if table was altered in past.
      - xlogging even for temp tables
      
      These are fixed as well with this patch. Additional tests added to cover for
      those missing scenaiors. Also, moved the AO specific code to aomd.c, out of
      tablecmds.c file to reduce conflicts to upstream.
      b3aff72d
  10. 05 6月, 2018 3 次提交
    • J
      Implement CPUSET (#5023) · 0c0782fe
      Jialun 提交于
      * Implement CPUSET, a new management of cpu resource in resource
      group which can reserve the specified cores for specified
      resource group exclusively. This can ensure that there are always
      available cpu resources for the group which has set CPUSET.
      The most common scenario is allocating fixed cores for short
      queries.
      
      - One can use it by executing CREATE RESOURCE GROUP xxx WITH (
        cpuset='0-1', xxxx). 0-1 are the reserved cpu cores for
        this group. Or ALTER RESOURCE GROUP SET CPUSET '0,1' to modify
        the value.
      - The syntax of CPUSET is a combination of the tuples, each
        tuple represents one core number or the core numbers interval,
        separated by comma. E.g. 0,1,2-3. All the core in CPUSET must be
        available in system and the core numbers in each group cannot
        have overlap.
      - CPUSET and CPU_RATE_LIMIT are mutually exclusive. One cannot
        create a resource group with both CPUSET and CPU_RATE_LIMIT.
        But the CPUSET and CPU_RATE_LIMIT can be freely switched in
        one group by executing ALTER operation, that means if one
        feature has been set, the other is disabled.
      - The cpu cores will be returned to GPDB, when the group has been
        dropped, or the CPUSET value has been changed, or the CPU_RATE_LIMIT
        has been set.
      - If some of the cores have been allocated to the resource group,
        then the CPU_RATE_LIMIT in other groups only indicating the
        percentage of cpu resources of the left cpu cores.
      - If the GPDB is busy, all the other cores which have not be
        allocated to any resource groups exclusively through CPUSET
        have already been run out, the cpu cores in CPUSET will still
        not be allocated.
      - The cpu cores in CPUSET will be used exclusively only in GPDB
        level, the other non-GPDB processes in system may use them.
      - Add test cases for this new feature, and the test environment
        must contain at least two cpu cores, so we upgrade the configuration
        of instance_type in resource_group jobs.
      
      * - Compatible with the case that cgroup directory cpuset/gpdb
        does not exist
      - Implement pg_dump for cpuset & memory_auditor
      - Fix a typo
      - Change default cpuset value from empty string to -1, for
        the code in 5X assume that all the default value in
        resource group is integer, a non-integer value will make the
        system fail to start
      0c0782fe
    • A
      Remove incorrect fixme about temp tables · 41dad081
      Asim R P 提交于
      Temp tables must be included in PREPARE and COMMIT records in GPDB because they
      are not exempt from 2PC, as in upstream.
      Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
      41dad081
    • A
      Always generate relfilenode for about to be created relations · a549a53c
      Asim R P 提交于
      We have found the culprit causing relfilenode collisions to be VACUUM
      FULL on a mapped relation.  The code was reusing OID as relfilenode for
      the temporary table created by vacuum full.  This happened without
      bumping the relfilenode counter.  The patch fixes this such that
      relfilenode is always generated, even in case of mapped relations.
      
      With this, we believe that the possibility of collision still exists in the way
      sequence OIDs are generated.  That needs to be fixed in a separate patch.  The
      fixme in GetNewRelFileNode() should be sufficient to note this.
      Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
      a549a53c
  11. 04 6月, 2018 3 次提交
  12. 02 6月, 2018 1 次提交
    • A
      Fix some incorrectly merged code (#5084) · 50903a55
      Ashwin Agrawal 提交于
      * Remove redundant copy of toast and its index in ATExecSetTableSpace()
      
      Commit f70f49fe introduced this double copy of
      toast and its index. Lets fix it.
      
      * Fix mismerged lines in src/interfaces/libpq/Makefile.
      
      Author: Ashwin Agrawal <aagrawal@pivotal.io>
      50903a55
  13. 01 6月, 2018 4 次提交
    • T
      Add tests for GPDB specific collation creation · e0845912
      Taylor Vesely 提交于
      Unlike upstream, GPDB needs to keep collations in-sync between multiple
      databases. Add tests for GPDB specific collation behavior.
      
      These tests need to import a system locale, so add a @syslocale@ variable to
      gpstringstubs.pl in order to test the creation/deletion of collations from
      system locales.
      Co-authored-by: NJim Doty <jdoty@pivotal.io>
      e0845912
    • T
      Add dispatch to collation creation commands · d73a185b
      Taylor Vesely 提交于
      Make CREATE COLLATION and pg_import_system_collations() parallel aware by
      dispatching collation creation to the QEs.
      
      In order for collations to work correctly, we need to be sure that every
      collation that is created on the QD is also installed on the QEs, and that the
      OID matches in every database. We will take advantage of two phase commit to
      prevent collations from being created if there is a problem adding it on any
      QE. In upstream, collations are created during initdb, but this won't work for
      GPDB, because while initdb is running there is no way to be sure that every
      segment has the same locales installed.
      
      We disable collation creation during initdb, and make it the responsibility of
      the system administrator to initialize any needed collations by either running
      a CREATE COLLATION command, or running the pg_import_system_collations() UDF.
      Co-authored-by: NJim Doty <jdoty@pivotal.io>
      d73a185b
    • T
      Updated version of pg_import_system_collations() · 91d65139
      Tom Lane 提交于
      Pull in more recent version of pg_import_system_collations() from
      upstream. We have not pulled in the ICU collations, so wholesale
      remove the sections of code that deal with them.
      
      This commit is primarily a cherry-pick of 0b13b2a7, but also pulls
      in prerequisite changes for CollationCreate().
      
      	Rethink behavior of pg_import_system_collations().
      
      	Marco Atzeri reported that initdb would fail if "locale -a" reported
      	the same locale name more than once.  All previous versions of Postgres
      	implicitly de-duplicated the results of "locale -a", but the rewrite
      	to move the collation import logic into C had lost that property.
      	It had also lost the property that locale names matching built-in
      	collation names were silently ignored.
      
      	The simplest way to fix this is to make initdb run the function in
      	if-not-exists mode, which means that there's no real use-case for
      	non if-not-exists mode; we might as well just drop the boolean argument
      	and simplify the function's definition to be "add any collations not
      	already known".  This change also gets rid of some odd corner cases
      	caused by the fact that aliases were added in if-not-exists mode even
      	if the function argument said otherwise.
      
      	While at it, adjust the behavior so that pg_import_system_collations()
      	doesn't spew "collation foo already exists, skipping" messages during a
      	re-run; that's completely unhelpful, especially since there are often
      	hundreds of them.  And make it return a count of the number of collations
      	it did add, which seems like it might be helpful.
      
      	Also, re-integrate the previous coding's property that it would make a
      	deterministic selection of which alias to use if there were conflicting
      	possibilities.  This would only come into play if "locale -a" reports
      	multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
      	but that hardly seems out of the question.
      
      	In passing, fix incorrect behavior in pg_import_system_collations()'s
      	ICU code path: it neglected CommandCounterIncrement, which would result
      	in failures if ICU returns duplicate names, and it would try to create
      	comments even if a new collation hadn't been created.
      
      	Also, reorder operations in initdb so that the 'ucs_basic' collation
      	is created before calling pg_import_system_collations() not after.
      	This prevents a failure if "locale -a" were to report a locale named
      	that.  There's no reason to think that that ever happens in the wild,
      	but the old coding would have survived it, so let's be equally robust.
      
      	Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
      	(cherry picked from commit 0b13b2a7)
      91d65139
    • P
      Add function to import operating system collations · 7dee9e44
      Peter Eisentraut 提交于
      Move this logic out of initdb into a user-callable function.  This
      simplifies the code and makes it possible to update the standard
      collations later on if additional operating system collations appear.
      Reviewed-by: NAndres Freund <andres@anarazel.de>
      Reviewed-by: NEuler Taveira <euler@timbira.com.br>
      (cherry picked from commit aa17c06f)
      7dee9e44
  14. 30 5月, 2018 1 次提交
    • T
      Refine the fault injector framework (#5013) · 723e5848
      Tang Pengzhou 提交于
      * Refine the fault injector framework
      
      * Add counting feature so a fault can be triggered N times.
      * Add a simpler version named gp_inject_fault_infinite.
      * Refine and make code cleaner include renaming sleepTimes
        to extraArg so it can be used by other fault types.
      
      Now 3 functions provided:
      
      1. gp_inject_fault(faultname, type, ddl, database, tablename,
      					start_occurrence, end_occurrence, extra_arg, db_id)
      startOccurrence: nth occurrence that a fault starts triggering
      endOccurrence: nth occurrence that a fault stops triggering,
      -1 means the fault is always triggered until it is reset.
      
      2. gp_inject_fault(faultname, type, db_id)
      simpler version for fault triggered only once.
      
      3. gp_inject_fault_infinite(faultname, type, db_id)
      simpler version for fault always triggered until it's reset.
      
      * fix bgwriter_checkpoint case
      
      * use gp_inject_fault_infinite here instead of gp_inject_fault so cache
        of pg_proc that contains gp_inject_fault_infinite is loaded before
        checkpoint and the following gp_inject_fault_infinite don't dirty the
        buffer again.
      * Add a matchsubs to ignore 5 or 6 times hits of fsync_counter.
      
      * Fix flaky twophase_tolerance_with_mirror_promotion test
      
      * use different session for  Scenario 2 and  Scenario 3 because
        the gang of session 2 is no longer valid.
      * wait for wanted fault to be triggered so no unexpected error occurs.
      
      * Add more segment status info to identify error quickly
      
      Some cases are right behind FTS test cases. If the segments are not
      in the desired status, those test cases will fail unexpectedly, this
      commit adds more debug info at the beginning of test cases to help
      to identify issues quickly.
      
      * Enhance cases to skip fts probe for sure
      
      * Do FTS probe request twice to guarantee fts error is triggered
      723e5848
  15. 29 5月, 2018 2 次提交
    • N
      Support RETURNING for replicated tables. · fb7247b9
      Ning Yu 提交于
      * rpt: reorganize data when ALTER from/to replicated.
      
      There was a bug that altering from/to a replicated table has no effect,
      the root cause is that we did not change gp_distribution_policy neither
      reorganize the data.
      
      Now we perform the data reorganization by creating a temp table with the
      new dist policy and transfering all the data to it.
      
      * rpt: support RETURNING for replicated tables.
      
      This is to support below syntax (suppose foo is a replicated table):
      
      	INSERT INTO foo VALUES(1) RETURNING *;
      	UPDATE foo SET c2=c2+1 RETURNING *;
      	DELETE * FROM foo RETURNING *;
      
      A new motion type EXPLICIT GATHER MOTION is introduced in EXPLAIN
      output, data will be received from one explicit sender in this motion
      type.
      
      * rpt: fix motion type under explicit gather motion.
      
      Consider below query:
      
      	INSERT INTO foo SELECT f1+10, f2, f3+99 FROM foo
      	  RETURNING *, f1+112 IN (SELECT q1 FROM int8_tbl) AS subplan;
      
      We used to generate a plan like this:
      
      	Explicit Gather Motion 3:1  (slice2; segments: 3)
      	  ->  Insert
      	        ->  Seq Scan on foo
      	        SubPlan 1  (slice2; segments: 3)
      	          ->  Gather Motion 3:1  (slice1; segments: 1)
      	                ->  Seq Scan on int8_tbl
      
      A gather motion is used for the subplan, which is wrong and will cause a
      runtime error.
      
      A correct plan is like below:
      
      	Explicit Gather Motion 3:1  (slice2; segments: 3)
      	  ->  Insert
      	        ->  Seq Scan on foo
      	        SubPlan 1  (slice2; segments: 3)
      	          ->  Materialize
      	                ->  Broadcast Motion 3:3  (slice1; segments: 3)
      	                      ->  Seq Scan on int8_tbl
      
      * rpt: add test case for with both PRIMARY and UNIQUE.
      
      On a replicated table we could set both PRIMARY KEY and UNIQUE
      constraints, test cases are added to ensure this feature during future
      development.
      
      (cherry picked from commit 72af4af8)
      fb7247b9
    • N
      Preserve persistence when reorganizing temp tables. · 0ce07109
      Ning Yu 提交于
      When altering a table's distribution policy we might need to reorganize
      the data by creating a __temp__ table, copying the data to it, then swap
      the underlying relation files.  However we always create the __temp__
      table as permanent, then when the original table is temp the underlying
      files can not be found in later queries.
      
      	CREATE TEMP TABLE t1 (c1 int, c2 int) DISTRIBUTED BY (c1);
      	ALTER TABLE t1 SET DISTRIBUTED BY (c2);
      	SELECT * FROM t1;
      0ce07109
  16. 28 5月, 2018 2 次提交
    • N
      Revert "Support RETURNING for replicated tables." · a74875cd
      Ning Yu 提交于
      This reverts commit 72af4af8.
      a74875cd
    • N
      Support RETURNING for replicated tables. · 72af4af8
      Ning Yu 提交于
      * rpt: reorganize data when ALTER from/to replicated.
      
      There was a bug that altering from/to a replicated table has no effect,
      the root cause is that we did not change gp_distribution_policy neither
      reorganize the data.
      
      Now we perform the data reorganization by creating a temp table with the
      new dist policy and transfering all the data to it.
      
      
      * rpt: support RETURNING for replicated tables.
      
      This is to support below syntax (suppose foo is a replicated table):
      
      	INSERT INTO foo VALUES(1) RETURNING *;
      	UPDATE foo SET c2=c2+1 RETURNING *;
      	DELETE * FROM foo RETURNING *;
      
      A new motion type EXPLICIT GATHER MOTION is introduced in EXPLAIN
      output, data will be received from one explicit sender in this motion
      type.
      
      
      * rpt: fix motion type under explicit gather motion.
      
      Consider below query:
      
      	INSERT INTO foo SELECT f1+10, f2, f3+99 FROM foo
      	  RETURNING *, f1+112 IN (SELECT q1 FROM int8_tbl) AS subplan;
      
      We used to generate a plan like this:
      
      	Explicit Gather Motion 3:1  (slice2; segments: 3)
      	  ->  Insert
      	        ->  Seq Scan on foo
      	        SubPlan 1  (slice2; segments: 3)
      	          ->  Gather Motion 3:1  (slice1; segments: 1)
      	                ->  Seq Scan on int8_tbl
      
      A gather motion is used for the subplan, which is wrong and will cause a
      runtime error.
      
      A correct plan is like below:
      
      	Explicit Gather Motion 3:1  (slice2; segments: 3)
      	  ->  Insert
      	        ->  Seq Scan on foo
      	        SubPlan 1  (slice2; segments: 3)
      	          ->  Materialize
      	                ->  Broadcast Motion 3:3  (slice1; segments: 3)
      	                      ->  Seq Scan on int8_tbl
      
      
      * rpt: add test case for with both PRIMARY and UNIQUE.
      
      On a replicated table we could set both PRIMARY KEY and UNIQUE
      constraints, test cases are added to ensure this feature during future
      development.
      72af4af8
  17. 26 5月, 2018 1 次提交