1. 02 8月, 2019 3 次提交
    • B
      Support GIN Indexes with ORCA. · eae823f4
      Bhuvnesh Chaudhary 提交于
      This commit adds the GPDB side changes required to support GIN Indexes with
      ORCA.  It also adds a new test file gp_gin_indexes to test plans produced for
      ORCA/planner.
      
      GIN indexes are not supported with index expression or predicate constraints.
      ORCA does not support it currently for other types of indexes too.
      eae823f4
    • I
      Unify backend/access/gin unittest infrastructure with other unit tests (#8275) · fc88b4ee
      Ivan Leskin 提交于
      There is a pipeline for unit tests in GPDB used in most cases.
      
      However, unit tests of src/backend/access/gin introduced by 99360f54
      used a custom implementation of unit test build script. This led to
      errors e.g. when a compiler different than GCC was used to build GPDB.
      
      Rewrite Makefile in order to unify test infrastructure with common
      pattern used in the backend while retaining test isolation from the
      backend objects.
      
      See also similar Makefile: src/backend/catalog/test/Makefile
      at 122c79f2
      
      Note the Makefile in src/backend/access/gin/test is different from
      currently most used version of a backend unit test Makefile. These
      differences and motivation for them is described in the README.
      
      Run pgindent on ginpostlist_fakes.c
      Reviewed-by: NAdam Berlin <aberlin@pivotal.io>
      Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Co-authored-by: NIvan Leskin <leskin.in@arenadata.io>
      fc88b4ee
    • D
      Fix aocs table block version mismatch (#8202) · 41fd823a
      David Kimura 提交于
      ALTER TABLE DROP COLUMN followed by reorganize leads to loss of column
      encoding settings of the dropped column. When the column's compresstype
      encoding is incorrect, we can encounter block version mismatch error
      later during block info validation check of the dropped column.
      
      One idea was to skip dropped columns when constructing AOCSScanDesc.
      However, dropping all columns is a special case that is not easily handled
      because it is not equivalent to deleted rows. Instead, the fix is to preserve
      column encoding settings even for dropped columns.
      Co-authored-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>
      Co-authored-by: NIvan Leskin <leskin.in@arenadata.io>
      41fd823a
  2. 01 8月, 2019 1 次提交
  3. 31 7月, 2019 5 次提交
    • A
      Fix gpperfomon GUC definitions · b5729a03
      Asim R P 提交于
      GUC gp_enable_gpperfmon is defined to be set only at postmaster
      restart.  Having a check hook that checks if the process setting it
      has superuser privileges is meaningless.  The check hook is removed.
      
      GUC gp_gpperfmon_send_interval is intended to be set only by
      superuser.  Adjust its definition accordingly and leverage checks
      built into GUC framework for superuser privileges.  The check hook for
      this GUC tried to achieve the same but incorrectly.  If the check hook
      was invoked at the beginning of main query processing loop by a
      backend process, it would crash.  At the beginning of the main loop, a
      transaction is not started yet.  The check hook invokes superuser()
      interface, which performs catalog access.  Doing so without starting a
      transaction is a recipe for crashing badly.  Such a crash was observed
      in production at least once.
      
      Thank you Jesse Zhang for suggesting to remove superuser check.
      
      The patch doesn't add any tests because, after removing the check
      hooks, the checks built into GUC framework are being used.  That code
      path is well exercised by existing regression tests.
      
      Reviewed-by: Daniel Gustafsson
      b5729a03
    • D
      Convert bin, sbin and doc in gpMgmt to recursive targets · b5aba18b
      Daniel Gustafsson 提交于
      Installing the Management utilities used to be pretty brute-force
      operation which copied more or less everything over blindly and then
      tried to remove what shouldn't be installed. This is clearly not a
      terribly clean and sustainable solution, as subsequent issues with
      it has proven (editor savefiles, patch .rej/.orig files etc were
      routinely copied and never purged etc).
      
      This takes a first stab at turning installation of gpMgmt/bin, sbin
      and doc into proper recursive make targets which only install the
      files that were intended to be installed.
      
      Discussion: https://github.com/greenplum-db/gpdb/pull/8179
      Reviewed by Bradford Boyle, Kalen Krempely, Jamie McAtamney and
      many more
      b5aba18b
    • A
      Create a sub memory context for serialization functions · f780ca11
      Adam Lee 提交于
      And reset it after to make sure no memory leaks there.
      
      For instance, the deserialized transValues for new entries (not that
      temporary) are not in group_buf, and not freed.
      Co-authored-by: NNing Yu <nyu@pivotal.io>
      f780ca11
    • A
      Use mpool to allocate memory for aggregate transition data · c9258ae3
      Adam Lee 提交于
      So we could account its usage via GET_TOTAL_USED_SIZE(hashtable) and
      decide if it's time to spill.
      
      And now it's not needed to pfree() the transValue since it's in mpool
      and will be reset after spill.
      Co-authored-by: NGang Xiong <gxiong@pivotal.io>
      Co-authored-by: NRichard Guo <riguo@pivotal.io>
      c9258ae3
    • A
      Serialize the aggstates while spilling hash table · 6dadce04
      Adam Lee 提交于
      AggStates are now pointers allocated in aggcontext with type INTERNAL,
      just spilling the pointers don't decrease the memory usage and have
      possible memory leak if combining states without free.
      
      This commit serialize the aggstates, write the real data into file and
      free the memory.
      6dadce04
  4. 30 7月, 2019 1 次提交
    • N
      resgroup: hashagg: revert operator memory auto enlarging · 3827546a
      Ning Yu 提交于
      In resource group mode we ever introduced an operator memory automatic
      enlarging logic for hashagg, the point is to let hashagg fail on actual
      OOM instead of a soft quote checking, this helps to let hashagg run
      successfully with an initial low operator memory.
      
      However a bug was introduced by the auto enlarging logic, hashagg
      spilling can be disabled in resource group mode by accident.
      
      On the other hand we introduced a memory_spill_ratio=0 mode in resource
      group to use statement_mem for operators, which is the same behavior as
      resource queue.  The statement_mem setting is usually large enough and
      fine tuned by the users, in such a case we do not need the auto enlarge
      for hashagg, and it is better to keep the old quote checking behavior.
      
      In such a case we revert the hashagg related changes from below commits:
      
      - 40d955d6 Rid resource group on hashagg spill evaluation (#8199)
      - ede74cdc resgroup: reduce log level for operator memory overuse
      - f053e6cd resgroup: allow memory overuse for hashagg spill meta data
      - 90795402 resgroup: allow operators enlarge their memory quota
      Reviewed-by: NAdam Lee <ali@pivotal.io>
      Reviewed-by: NWeinan WANG <wewang@pivotal.io>
      
      Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/30hiTArxsgo/aydVMcrXBQAJ
      3827546a
  5. 26 7月, 2019 4 次提交
    • A
      Emit warning that an injector of a panic should consider disabling FTS. · fbd0f091
      Adam Berlin 提交于
      - Adds hint to user on how to disable FTS.
      
      - Registers warnings outside of the core injection framework
      
        This allows us to create warnings that are GPDB specific, for
        things like FTS without tainting the core framework which is
        theoretically Postgres-only dependent.
      fbd0f091
    • W
      CREATE DATABASE two-phase commit safe (#8078) · 7c89cde7
      Weinan WANG 提交于
      When a failed raised by `CREATE DATABASE`, relevant files, directories
      should not be leftover.
      
      Add new DB information into `pendingDbDeletes` list, so that promise `CREATE DATABASE` as a 2pc safe command.
      Co-authored-by: NAsim R P <apraveen@pivotal.io>
      7c89cde7
    • A
      Save the complete slot data in ExecMaterializeSlot() · 1eeae2e9
      Adam Lee 提交于
      ExecMaterializeSlot() transformed any tuple to a virtual tuple via
      slot_getallattrs(), then formed a heaptuple from it, ctid was lost here
      since virtual tuples have no system columns.
      
      This commit copies the entire htup directly if we have a regular
      physical tuple but not locally palloc'd.
      1eeae2e9
    • F
      Propagate external table format options to PXF headers · 91d4e40a
      Francisco Guerrero 提交于
      Currently, PXF is not propagating the format options in the external
      table framework. In Foreign Data Wrappers, these options are defined at
      the foreign table-level and are being propagated to PXF.
      
      In order for the PXF Server to better support both FDW and external
      tables we are consistently passing format information from both clients.
      91d4e40a
  6. 25 7月, 2019 6 次提交
  7. 24 7月, 2019 2 次提交
  8. 23 7月, 2019 4 次提交
    • Z
      Refactor and improve cdbpath_motion_for_join · e2731add
      Zhenghua Lyu 提交于
      This commit refactor the function `cdbpath_motion_for_join` to make
      it clear and generate better plan for some cases.
      
      In distributed computing system, to gather distributed data into a
      singleQE should always be the last choice. Previous code for general
      and segmentgeneral, when they are not ok_to_replicate, will try to
      gather other locus to singleQD. This commit improves this by firstly
      trying to add redistributed motion.
      
      The logic for the join result's locus (outer's locus is general):
        1. if outer is ok to replicated, then result's locus is the same
           as inner's locus
        2. if outer is not ok to replicated (like left join or wts cases)
           2.1 if inner's locus is hashed or hashOJ, we try to redistribute
               outer as the inner, if fails, make inner singleQE
           2.2 if inner's locus is strewn, we try to redistribute
               outer and inner, if fails, make inner singleQE
           2.3 just return the inner's locus, no motion is needed
      
      The logic for the join results' locus (outer's locus is segmentgenral):
      - if both are SegmentGeneral:
           1. if both locus are equal, no motion needed, simply return
           2. For update cases. If resultrelation
              is SegmentGeneral, the update must execute
              on each segment of the resultrelation, if resultrelation's
              numsegments is larger, the only solution is to broadcast
              other
           3. no motion is needed, change both numsegments to common
             - if only one of them is SegmentGeneral:
               1. consider update case, if resultrelation is SegmentGeneral,
                  the only solution is to broadcast the other
               2. if other's locus is singleQE or entry, make SegmentGeneral
                  to other's locus
               3. the remaining possibility of other's locus is partitioned
                  3.1 if SegmentGeneral is not ok_to_replicate, try to
                      add redistribute motion, if fails gather each to
                      singleQE
                  3.2 if SegmentGeneral's numsegments is larger, just return
                      other's locus
                  3.3 try to add redistribute motion, if fails, gather each
                      to singleQE
      e2731add
    • Z
      Remove Replicated Locus in cdbpath_motion_for_join · 20248a31
      Zhenghua Lyu 提交于
      Locus type Replicated can only be generated by join operation.
      And in the function cdbpathlocus_join there is a rule:
          `<any locus type> join <Replicated> => any locus type`
      
      Proof by contradiction, it shows that when code arrives here,
      it is impossible that any of the two input paths' locus
      is Replicated. So we add two asserts here.
      20248a31
    • A
      Re-enable `COPY (query) TO` on utility mode · 41a8cf29
      Adam Lee 提交于
      It was disabled by accident several months ago while implementing
      `COPY (query) TO ON SEGMENT`, re-enable it.
      
      ```
      commit bad6cebc
      Author: Jinbao Chen <jinchen@pivotal.io>
      Date:   Tue Nov 13 12:37:13 2018 +0800
      
          Support 'copy (select statement) to file on segment' (#6077)
      ```
      
      WARNING: there are no safety protections on utility mode, it's not
      recommended except disaster recovery situation.
      Co-authored-by: NWeinan WANG <wewang@pivotal.io>
      41a8cf29
    • W
      Rid resource group on hashagg spill evaluation (#8199) · 40d955d6
      Weinan WANG 提交于
      Resource group believe memory access speed always faster than disk,
      and it adds hashagg executor node spill mechanism into its memory management.
      If the hash table size overwhelms `max_mem`, in resource group model, the hash
      table does not spill and fan out data. Resource group wants to grant more memory
      for the hash table. However, this strategy impact hash collision rate, so that
      some performance regression in some OLAP query.
      
      We rid resource group guc when hashagg evaluate if it needs to spill.
      Co-authored-by: NAdam Li <ali@pivotal.io>
      40d955d6
  9. 22 7月, 2019 6 次提交
    • A
      Fix memory quota calculation of aggregation · 79c83ea1
      Adam Lee 提交于
      MemoryAccounting_RequestQuotaIncrease() returns a number in bytes, but
      here expects kB.
      79c83ea1
    • A
      Refactor to remove raw_buf_done from external scan · 585e90b6
      Adam Lee 提交于
      scan->raw_buf_done was used for custom external table only, refactor
      to remove the MERGE_FIXME.
      
      cstate->raw_buf_len is safe to use since we operate pstate->raw_buf
      directly in this case.
      585e90b6
    • A
      Remove some unnecessary MERGE_FIXMEs · b6afd44c
      Adam Lee 提交于
      About the `isjoininner`, I searched the history commit in merge branch,
      it was removed by "e2fa76d8 - Use parameterized paths to generate
      inner indexscans more flexibly" on upstream from 9.2, that MERGE_FIXME
      was there because at that time functions which rely on `isjoininner`
      refused to compile.
      b6afd44c
    • A
      Expand sreh rejected count to int64 · 1f8254a8
      Adam Lee 提交于
      If there are more than INT_MAX rejected rows, this will overflow. That
      is possible at least if you specify the segment reject limit as a
      percentage.
      
      Still keep the SEGMENT REJECT LIMIT value as int, expanding that will
      break lots of things like catalog but benefit too little.
      1f8254a8
    • A
      Place cdbsreh counting into copy function · 1425f036
      Adam Lee 提交于
      Now the processed and rejected counting are in the NextCopyFrom() only,
      which reads next tuple from file, makes much more sense.
      1425f036
    • N
      Keep the order of reusing idle gangs · 51a7ea27
      Ning Yu 提交于
      For example:
      In the same session,
      query 1 has 3 slices and it creates gang 1, gang 2 and gang 3.
      query 2 has 2 slices, we hope it reuses gang 1 and gang 2 instead of other
      cases like gang 3 and gang 2.
      
      In this way, the two queries can have the same send-receive port pair. It's
      useful in platform like Azure. Because Azure limits the number of different
      send-receive port pairs (AKA flow) in a certain time period.
      Co-authored-by: NHubert Zhang <hzhang@pivotal.io>
      Co-authored-by: NPaul Guo <pguo@pivotal.io>
      Co-authored-by: NNing Yu <nyu@pivotal.io>
      51a7ea27
  10. 18 7月, 2019 1 次提交
    • H
      Consider non direct dispatch cost for cached plan. · 6f936827
      Hubert Zhang 提交于
      Prepare statement will bind parameters for each execution. It needs to decide to
      use a cached generic plan without params or a custom plan with params. In past,
      GPDB use plan cost plus re-plan cost to choose generic and custom plan. But
      generic plan does not contain params which leads to it could not generate
      direct dispatch plan compared with custom plan.
      
      For non direct dispatch plan it will introduce unneccessary QEs, which still
      need to go through volcano model, do two phase commit and write prepare xlog.
      So the cost of failed to generate direct dispatch plan would be higher in some
      case than the re-plan cost which makes custom plan runs faster than generic
      plan even if it needs to re-plan for every execute.
      
      Note that non direct dispatch cost is not considered in planner yet. Planner
      treats direct dispatch as an optimization and always enable it when possible.
      But for prepare statement, the case is that for generic plan it could not
      generate direct dispatch plan at all. But we need to consider this cost here,
      as a result, we introduce non direct dispatch cost into total cost only for
      cached plans.
      Co-authored-by: NNing Yu <nyu@pivotal.io>
      6f936827
  11. 17 7月, 2019 1 次提交
    • P
      Use the correct time unit in BackoffSweeper backend · 16522314
      Pengzhou Tang 提交于
      Commit bfd1f46c used the wrong time unit (expect Ms, passed with Us)
      in BackoffSweeper backend which makes it cannot re-calculate the CPU shares
      in time and the normal backends will sleep more CPU ticks than before in
      CHECK_FOR_INTERRUPTS and cause a performance downgrade.
      16522314
  12. 15 7月, 2019 2 次提交
    • A
      Pass correct in-progress array pointer for writing to file · e2886364
      Asim R P 提交于
      In case of extended queries, snapshot information is shared between
      reader and writer QEs using files.  The writer obtains a snapshot and
      writes it to file.  This patch fixes a bug that passed incorrect address
      of the array of in-progress transactions for writing to file.  The bug
      caused hard to reproduce errors in production workloads that involved
      extended queries (bind/execute libpq messages, declare cursor
      statements, certain PL/* statements such as RETURN and EXECUTE in
      pl/pgsql).
      
      Reviewed-by: Adam Berlin and Jesse Zhang
      e2886364
    • A
      Unittest for writing and reading cursor snapshot · 3304ac00
      Asim R P 提交于
      Functions under test are dumpSharedLocalSnapshot_forCursor() and
      readSharedLocalSnapshot_forCursor().  Significant amount of global state needs
      to be created for the test to be able to invoke the two functions.  What we are
      testing here is, whether correct snapshot information is included in what is
      written to file.  Validation is performed by reading the contents of the file
      and compairing them with expected values.
      
      An implicit rule to run a unittest is defined in src/Makefile.mock.  This patch
      overrides it such that the required directory for the cursor snapshot file is
      created before running the test.
      
      Reviewed-by: Adam Berlin and Jesse Zhang
      3304ac00
  13. 13 7月, 2019 1 次提交
    • R
      Disable shareinputscan with outer refs. · b658e3ed
      Richard Guo 提交于
      Currently shareinputscan doesn't handle rescan properly, and to fully
      support it, there will be a lot of code changes. After some discussions,
      we decide to disable shareinputscan with outer refs for now.
      b658e3ed
  14. 12 7月, 2019 1 次提交
    • W
      Skip start the stats sender process when gpperfmon not enabled (#8126) · 4a347206
      Wang Hao 提交于
      Before GP6, the stats sender process works for gpperfmon and metrics
      collector. If any of them enabled, stats sender should start.
      From GP6 metrics collector become a standalone bgworker.
      So this commit fix startup condition for stats sender, now it only
      starts when gpperfmon enabled.
      4a347206
  15. 11 7月, 2019 2 次提交
    • P
      Skip cdbcomponent_updateCdbComponents for FTS · f37bcae1
      Pengzhou Tang 提交于
      FTS takes responsibility for updating gp_segment_configuration, in each
      fts probe cycle, FTS firstly gets a copy of current configuration, then
      probe the segments based on it and finally free the copy in the end. In
      the probe stage, FTS might start/close transactions many times, so FTS
      should not update current copy of gp_segment_configuration when a new
      transaction is started.
      f37bcae1
    • W
      Merge pull request #8109 from magi345/master_bgworker_fix · b6c1b467
      Wang Hao 提交于
      Auxiliary bgworkers should skip resource group assignment
      b6c1b467