1. 29 9月, 2018 3 次提交
    • P
      Use WITH syntax for options in create tablespace. (#5877) · e92a82d0
      Paul Guo 提交于
      PG9.4 starts to allow the WITH syntax to support options
      in create tablespace. Greenplum previously used the OPTIONS
      syntax to support per segment location. Let's union them all to
      use the WITH syntax only, following upstream.
      
      Note the greenplum specific OPTIONS exists in gpdb master only.
      e92a82d0
    • T
      Correctly account for OptimizerOutstandingMemoryBalance · cd13f364
      Taylor Vesely 提交于
      Because the optimizer has its own memory management system, and does not
      make use of AllocationSets, the only way we know how much memory the
      optimizer is using is by intercepting the calls to malloc() and free().
      When the GUC 'optimizer_use_gpdb_allocators' is set to 'true' Orca will
      replace its native  alloc() and free() methods with Ext_OptimizerAlloc()
      and Ext_OptimizerFree(). These calls will track the total memory usage
      in the active 'Optimizer' memory account, and the total outstanding
      memory between queries in OptimizerOutstandingMemoryBalance.
      
      This is a problem when accounting for ORCA memory in the
      'X_NestedExecutor' account, because unless you add the
      OptimizerOutstandingMemoryBalance ORCA can free memory that was
      allocated in a previous query and underflow the account.
      
      Both in order to get an accurate idea of how much memory the optimizer
      is using, and to prevent problems with the 'X_NestedExecutor' account,
      it makes sense to track the ORCA's memory usage in a single account.
      Therefore, only create one 'Optimizer' account per query, no matter how
      many times we call it.
      Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
      cd13f364
    • T
      Add DEBUG mode to the explain_memory_verbosity GUC · 21f8a491
      Taylor Vesely 提交于
      The memory accounting system generates a new memory account for every
      execution node initialized in ExecInitNode. The address to these memory
      accounts is stored in the shortLivingMemoryAccountArray. If the memory
      allocated for shortLivingMemoryAccountArray is full, we will repalloc
      the array with double the number of available entries.
      
      After creating approximately 67000000 memory accounts, it will need to
      allocate more than 1GB of memory to increase the array size, and throw
      an ERROR, canceling the running query.
      
      PL/pgSQL and SQL functions will create new executors/plan nodes that
      must be tracked my the memory accounting system. This level of detail is
      not necessary for tracking memory leaks, and creating a separate memory
      account for every executor will use large amount of memory just to track
      these memory accounts.
      
      Instead of tracking millions of individual memory accounts, we
      consolidate any child executor account into a special 'X_NestedExecutor'
      account. If explain_memory_verbosity is set to 'detailed' and below,
      consolidate all child executors into this account.
      
      If more detail is needed for debugging, set explain_memory_verbosity to
      'debug', where, as was the previous behavior, every executor will be
      assigned its own MemoryAccountId.
      
      Originally we tried to remove nested execution accounts after they
      finish executing, but rolling over those accounts into a
      'X_NestedExecutor' account was impracticable to accomplish without the
      possibility of a future regression.
      
      If any accounts are created between nested executors that are not rolled
      over to an 'X_NestedExecutor' account, recording which accounts are
      rolled over can grow in the same way that the
      shortLivingMemoryAccountArray is growing today, and would also grow too
      large to reasonably fit in memory.
      
      If we were to iterate through the SharedHeaders every time that we
      finish a nested executor, it is not likely to be very performant.
      
      While we were at it, convert some of the convenience macros dealing with
      memory accounting for executor / planner node into functions, and move
      them out of memory accounting header files into the sole callers'
      compilation units.
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>
      Co-authored-by: NAdam Berlin <aberlin@pivotal.io>
      Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
      Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
      21f8a491
  2. 28 9月, 2018 13 次提交
    • D
      Order active window clauses for greater reuse of Sort nodes. · 3f0d46f7
      Daniel Gustafsson 提交于
      This is a backport of the below commit from postgres 12dev, which in turn
      is a patch that was influenced by an optimization from the previous version
      of the Greenplum Window code. The idea is to order the Sort nodes based on
      sort prefixes, such that sorts can be reused by subsequent nodes.
      
      As this uses EXPLAIN in the test output, a new expected file is added for
      ORCA output even though the patch only touches the postgres planner.
      
        commit 728202b6
        Author: Andrew Gierth <rhodiumtoad@postgresql.org>
        Date:   Fri Sep 14 17:35:42 2018 +0100
      
          Order active window clauses for greater reuse of Sort nodes.
      
          By sorting the active window list lexicographically by the sort clause
          list but putting longer clauses before shorter prefixes, we generate
          more chances to elide Sort nodes when building the path.
      
          Author: Daniel Gustafsson (with some editorialization by me)
          Reviewed-by: Alexander Kuzmenkov, Masahiko Sawada, Tom Lane
          Discussion: https://postgr.es/m/124A7F69-84CD-435B-BA0E-2695BE21E5C2%40yesql.se
      3f0d46f7
    • D
      Fix syntax errors in testsuites that aren't on purpose · 3f6273e1
      Daniel Gustafsson 提交于
      There were a few cases of broken queries in the test suites which
      weren't done on purpose in order to test the parser/grammar. This
      fixes the ones that stood out, but there are likely to be more in
      ignore blocks that slip through the cracks.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      3f6273e1
    • H
      Remove unnecessary code for the first ORDER BY column in window agg. · e70f73e0
      Heikki Linnakangas 提交于
      The purpose of this code was to treat the first ORDER BY column, in a
      window agg like "ROW_NUMBER() OVER (ORDER BY x RANGE BETWEEN 2 PRECEDING
      AND 2 FOLLOWING", the same way as volatile expressions, and add them to
      the target list as is. That was to ensure that it would be available for
      computing the window bounds. But upstream commit a2099360, merged as
      part of the 9.3 merge, got rid of the distinction between volatile and
      non-volatile expressions, so we no longer need to treat the first ORDER BY
      column any different either.
      e70f73e0
    • H
      Fix mixup between bitmap index's int48 and int42 support functions. · e628194e
      Heikki Linnakangas 提交于
      These were swapped. It's been wrong ever since we merged the operator
      family patch, during the 8.3 merge. But apparently it wasn't causing any
      ill effect, or at least I was not able to find a case that would fail
      because of it.
      
      This was caught by new sanity checks in the 'opr_sanity' regression
      test, introduced in the upcoming 9.4 merge.
      e628194e
    • H
      Fix the one-element tuple visibility cache in heap scans, for multi-xids. · fec5f0b5
      Heikki Linnakangas 提交于
      It's not cool to use the raw xmax value as part of the cache key. If the
      raw xmax represents a multi-xid, the real deleter XID would be something
      else. We could get fooled, if we cached a multi-XID value, and later saw
      a tuple with a regular xmax, with the same numerical value as the cached
      multi-XID.
      
      I think this was actually broken before the 9.3 merge already. If a
      transaction locked a tuple, and deleted another tuple, and a concurrent
      scan sees the locked tuple first, it might think that the deleted tuple
      is also visible to it, because it has the same xmin+xmax combination as
      the locked tuple.
      fec5f0b5
    • H
      Move code marked with FIXME to make_windowInputTargetList(). · fa4a2ccb
      Heikki Linnakangas 提交于
      make_windowInputTargetList() seems like a better place for this code,
      as suggested by the FIXME comment that was left here in the 9.3 merge.
      fa4a2ccb
    • H
      Disable DISTINCT-qualified aggregates with ORCA. · 386d5423
      Heikki Linnakangas 提交于
      The new regression tests revealed that it doesn't work. With an assertion-
      enabled ORCA build, I got an assertion failure like this:
      
      2018-08-23 11:20:08:371479 EEST,THD000,ERROR,""/home/heikki/gpdb/optimizer-main/libgpos/include/gpos/common/CDynamicPtrArray.h:300: Failed assertion: pos < m_size && ""Out of bounds access""
      Stack trace:
      1    0x00007f363fb3e78a gpos::CException::Raise + 252
      2    0x00007f3640be0970 gpos::CDynamicPtrArray + 84
      3    0x00007f3640c93dac gpopt::CWindowPreprocessor::SplitPrjList + 1162
      4    0x00007f3640c9404b gpopt::CWindowPreprocessor::SplitSeqPrj + 303
      5    0x00007f3640c94b61 gpopt::CWindowPreprocessor::PexprSeqPrj2Join + 357
      6    0x00007f3640c95276 gpopt::CWindowPreprocessor::PexprPreprocess + 316
      7    0x00007f3640c240a2 gpopt::CExpressionPreprocessor::PexprPreprocess + 1098
      8    0x00007f3640bc2d62 gpopt::CQueryContext::CQueryContext + 696
      9    0x00007f3640bc36df gpopt::CQueryContext::PqcGenerate + 1413
      10   0x00007f3640c95d86 gpopt::COptimizer::PdxlnOptimize + 1042
      11   0x000055b2e8252f26 COptTasks::OptimizeTask + 1488
      12   0x00007f363fb58a0d gpos::CTask::Execute + 183
      13   0x00007f363fb5d447 gpos::CWorker::Execute + 199
      14   0x00007f363fb56d77 gpos::CAutoTaskProxy::Execute + 287
      15   0x00007f363fb3479b gpos_exec + 800
      "",",,,,,,"explain  select dt, pn, sum(distinct pn) over (partition by dt), sum(pn) over (partition by dt) from sale;",0,,"COptTasks.cpp",545,
      2018-08-23 11:20:08.372392 EEST,"heikki","postgres",p19807,th1163394560,"[local]",,2018-08-23 11:19:53 EEST,0,con4,cmd7,seg-1,,dx6,,sx1,"LOG","00000","Planner produced plan :1",,,,,,"explain  select dt, pn, sum(distinct pn) over (partition by dt), sum(pn) over (partition by dt) from sale;",0,,"orca.c",61,
      
      This caused the query to fall back to planner, which worked. But with
      assertions disabled, it crashed instead.
      
      We should fix ORCA to deal with that. One option is to rip out all the
      special code to plan DISTINCT-qualified aggregates in ORCA, and just pass
      through the windistinct flag to the executor. That's basically what the
      Postgres planner does, and the executor will deal with deduplicating the
      input. But for now, let's just stop the crashing.
      386d5423
    • H
      Reimplement DISTINCT for window aggregates. · 6523b432
      Heikki Linnakangas 提交于
      GPDB 5 supported DISTINCT in window aggregates, e.g:
      
      COUNT(DISTINCT x) OVER (PARTITION BY y)
      
      However, PostgreSQL does not support that, and as a result, GPDB lost that
      capability as part of the window functions rewrite, too. In the upstream,
      there's an explicit check for that, that it was lost in the window function
      rewrite. So the parser accepted that, but it was executed just like if
      there was no DISTINCT. There were also no tests for this, that would return
      a different result with the DISTINCT than without, which is why no-one
      noticed it.
      
      To fix, implement the DISTINCT support, to the same extent that the old
      implementation supported it. The new implementation adds a little sort +
      deduplicate step for each DISTINCT aggregate. I'm not sure how this
      compares with the old implementation, performance-wise, but at least it
      works now.
      
      Also, add the missing tests.
      6523b432
    • P
      Fix the interval issue when Seting up TCP interconnect · 3c08295f
      Pengzhou Tang 提交于
      When TCP connections cannot be setup for a long time, we check
      if some segments are already failed out, the check is an high-cost
      operation, so we set the interval to 2 seconds. We used to use a
      counter to record the interval which is not reliable because
      a loop cycle (500ms) may be interrupted earlier due to
      EINT/EAGAIN of select().
      
      To not affect the setup performance of TCP interconnect, we need
      to make the interval mechanism more reliable.
      3c08295f
    • Z
      Allow tables to be distributed on a subset of segments · 4eb65a53
      ZhangJackey 提交于
      There was an assumption in gpdb that a table's data is always
      distributed on all segments, however this is not always true for example
      when a cluster is expanded from M segments to N (N > M) all the tables
      are still on M segments, to workaround the problem we used to have to
      alter all the hash distributed tables to randomly distributed to get
      correct query results, at the cost of bad performance.
      
      Now we support table data to be distributed on a subset of segments.
      
      A new columne `numsegments` is added to catalog table
      `gp_distribution_policy` to record how many segments a table's data is
      distributed on.  By doing so we could allow DMLs on M tables, joins
      between M and N tables are also supported.
      
      ```sql
      -- t1 and t2 are both distributed on (c1, c2),
      -- one on 1 segments, the other on 2 segments
      select localoid::regclass, attrnums, policytype, numsegments
          from gp_distribution_policy;
       localoid | attrnums | policytype | numsegments
      ----------+----------+------------+-------------
       t1       | {1,2}    | p          |           1
       t2       | {1,2}    | p          |           2
      (2 rows)
      
      -- t1 and t1 have exactly the same distribution policy,
      -- join locally
      explain select * from t1 a join t1 b using (c1, c2);
                         QUERY PLAN
      ------------------------------------------------
       Gather Motion 1:1  (slice1; segments: 1)
         ->  Hash Join
               Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
               ->  Seq Scan on t1 a
               ->  Hash
                     ->  Seq Scan on t1 b
       Optimizer: legacy query optimizer
      
      -- t1 and t2 are both distributed on (c1, c2),
      -- but as they have different numsegments,
      -- one has to be redistributed
      explain select * from t1 a join t2 b using (c1, c2);
                                QUERY PLAN
      ------------------------------------------------------------------
       Gather Motion 1:1  (slice2; segments: 1)
         ->  Hash Join
               Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
               ->  Seq Scan on t1 a
               ->  Hash
                     ->  Redistribute Motion 2:1  (slice1; segments: 2)
                           Hash Key: b.c1, b.c2
                           ->  Seq Scan on t2 b
       Optimizer: legacy query optimizer
      ```
      4eb65a53
    • M
      db9a2ec4
    • M
      docs - update boostfs install instructions. (#5859) · f932f7ee
      Mel Kiyama 提交于
      -How to find the boostfs config. guide
      -How to find the boostfs RPM
      f932f7ee
    • L
      docs - sql and catalog ref page updates for i/o conversion casts (#5812) · 97e7d520
      Lisa Owen 提交于
      * docs - sql and catalog ref page updates for i/o conversion casts
      
      * address comments from heikki
      97e7d520
  3. 27 9月, 2018 12 次提交
    • J
      Update the generated pipeline file · c2759ad7
      Joao Pereira 提交于
      The commit 7605710c did not update the yml file with the pipeline
      configuration for master.
      c2759ad7
    • D
      Change the way quicklz is compiled and used in CI · 7605710c
      David Kimura 提交于
      - Due to changes in the structure of gpaddon we can no longer use the
      resource gpaddon_src to compile 5X for Binary Swap Jobs.
      - From this point on we should use the 5X_RELEASE tag on gpaddon to
      compile Greenplum for these jobs.
      - Change the expected quicklz error message while building OSS Greenplum
      - Explicitly add Greenplum bin folder to the path
      - Add back the rsync of the quicklz addon folder
        This was added back to ensure that the enterpreize build still works
      correctly
      - Use the correct branch to compile Binary Swap version
      
      Ensure that quicklz is not build for windows
      
      We will not support, at this time, the compilation of quicklz and
      installation for our windows built
      Signed-off-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
      7605710c
    • H
      Remove built-in stub functions for QuickLZ compressor. · 589533be
      Heikki Linnakangas 提交于
      The proprietary build can install them as normal C language functions,
      with CREATE FUNCTION, instead.
      
      In the passing, remove some unused QuickLZ debugging GUCs.
      
      This doesn't yet get rid of all references to QuickLZ, unfortunately. The
      GUC and reloption validation code still needs to know about it, so that
      they can validate the options read from postgresql.conf, when starting up
      postmaster. For the same reason, you cannot yet add custom compression
      algorithms, besides quicklz, as an extension. But this is another step in
      the right direction, anyway.
      Co-authored-by: NJimmy Yih <jyih@pivotal.io>
      Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
      589533be
    • D
      Rename variable to avoid risk of type collision · 5da162c4
      Daniel Gustafsson 提交于
      The comment states that "small" might be defined by socket.h, and while
      thats not true for all versions of sys/socket.h, it's still not a good
      name to use as it's common in Windows headers (should we ever revive a
      Windows port). Renaming to a non-colliding name is a small price to pay
      to avoid subtle bugs, so rename and remove the preprocessor dance.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      5da162c4
    • D
      Refactor qp_functions test suite · 8f770d6d
      Daniel Gustafsson 提交于
      The test suite, which was ported over from TINC, was ignoring so much of
      the memorized output that it more or less didn't test anything (and the
      ignored blocks was as full of outdated output as one would imagine). The
      code was also formatted in weird ways and had needless NOTICEs thrown
      during execution.
      
      This refactors the testsuite to remove all ignore blocks, removes some
      utterly pointless tests (there are many more of them left), formats the
      code to be readable, fixes the output to work and removes some duplicate
      tests.
      
      The remaining bits of the suite is by no means terribly interestering,
      but it runs fast enough that it's worth keeping the leftovers for now.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      8f770d6d
    • P
      Remove compile_gpdb_windows_cl job from master pipeline (#5804) · b0f124b5
      Peifeng Qiu 提交于
      Upstream has upgraded windows compile script and use newer version
      of Perl. This may block current merging effort. We plan to do
      windows native compiling for gpdb 6 so this job is no longer
      necessary for gpdb_master.
      b0f124b5
    • T
      Dispatcher can create flexible size gang (#5701) · a3ddac06
      Tang Pengzhou 提交于
      * change type of db_descriptors to SegmentDatabaseDescriptor **
      
      A new gang definination may consist of cached segdbDesc and new
      created segdbDesc, there is no need to palloc all segdbDesc struct
      as new.
      
      * Remove unnecessary allocate gang unit test
      
      * Manage idle segment dbs using CdbComponentDatabases instead of available* lists.
      
      To support vary size gang, we now need to manage segment dbs in a lower
      granularity, previously, idle QEs is managed by a bunch of lists like
      availablePrimaryWriterGang, availableReaderGangsN, this restrict
      dispatcher to only create N-size (N = number of segments) or 1-size
      gang.
      
      CdbComponentDatabases is a snapshot of segment components within current
      cluster, now it maintains a freelist for each segment component. When
      creating gang, dispatcher will make up a gang from each segment
      component (from freelist or create a new segment db). When cleaning up
      a gang, dispatcher will return idle segment dbs to each segment
      component.
      
      CdbComponentDatabases provide a few functions to manipulate segment dbs
      (SegmentDatabaseDescriptor *):
      * cdbcomponent_getCdbComponents
      * cdbcomponent_destroyCdbComponents
      * cdbcomponent_allocateIdleSegdb
      * cdbcomponent_recycleIdleSegdb
      * cdbcomponent_cleanupIdleSegdbs
      
      CdbComponentDatabases is also FTS version sensitive, so once a FTS
      version changed, CdbComponentDatabases destroy all idle segment dbs
      and allocate QEs in the new promoted segment. This provides the ability
      to transparent mirror failover to users.
      
      Since segment dbs(SegmentDatabaseDescriptor *) are managed by
      CdbComponentDatabases now, we can simplify the memory context
      management by replacing GangContext & perGangContext with
      DispatcherContext & CdbComponentsContext.
      
      * Postpone the error hanlding when creating gang
      
      Now we have AtAbort_DispatcherState, one advantage of it is that
      we can postpone gang error hanlding in this function and make
      code cleaner.
      
      * Handle FTS version change correctly
      
      In some cases, when a FTS version changed, we can't update current
      snapshot of segment components, to be more specifically, we can't
      destroy current writer segment dbs and create new segment dbs.
      
      These cases include:
      * session has temp table created.
      * query need two-phase commit and gxid has been dispatched to
        segments.
      
      * Replace <gangId, sliceId> map with <qeIdentifier, sliceId> map
      
      We used to dispatch a <gangId, sliceId> map along with query to
      segment dbs so segment dbs can know which slice they should
      execute.
      
      Now gangId is useless for a segment db because a segment db can
      be reused by different gang, so we need a new way to tell the
      info to segment dbs. To resolve this, CdbComponentDatabases
      assign a unique identifier to each segment db and make up a
      bitmap set which consist of segment identifiers for each slice,
      segment dbs then can go through the slice table and find the
      right slice to execute.
      
      * Allow dispatcher to create vary size gang and refine AssignGangs()
      
      Previously, dispatcher can only create N-size gang for
      GANGTYPE_PRIMARY_WRITER or GANGTYPE_PRIMARY_READER. this
      restrict dispatcher in many ways, one example is direct
      dispatch, it always create a N-size gang even it only
      dispatch the command to one segment, another example is
      some operations may be able to use N+ size gang, like
      hash join, if both inner and outer plan is redistributed,
      the hash join node can associate with a N+ size gang to
      execute. This commit changes the API of createGang() so the
      caller can specify a list of segments (partial or even
      duplicate segments), CdbCompoentDatabase will guarantee
      each segment has only one writer in a session. With this
      it also resolves another pain point of AssignGangs(), so
      the caller don't need to promote a GANGTYPE_PRIMARY_READER
      to GANGTYPE_PRIMARY_WRITER, or promote a GANGTYPE_SINGLETON
      _READER to GANGTYPE_PRIMARY_WRITER for replicated table
      (see FinalizeSliceTree()).
      
      With this commit, AssignGang() is very clear now.
      a3ddac06
    • P
      Remove remove_subquery_in_RTEs() call in standard_planner() (#5863) · 69cd1ec5
      Paul Guo 提交于
      As the comment said, this was useful howerver now that we have
      upstream add_rte_to_flat_rtable() to handle that, let's remove
      this call.
      69cd1ec5
    • D
      Move pxf-infra to new consolidated repo · abccfe61
      Divya Bhargov 提交于
      Co-authored-by: NDivya Bhargov <dbhargov@pivotal.io>
      Co-authored-by: NLav Jain <ljain@pivotal.io>
      abccfe61
    • D
      Remove unused variable · cc853420
      Daniel Gustafsson 提交于
      Fixes clang (and probably gcc) compiler warning on unused variable.
      Reviewed-by: NPaul Guo <pguo@pivotal.io>
      Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
      cc853420
    • D
      Increase default value of wal_keep_segments GUC. · dd18c4a0
      David Kimura 提交于
      Until we have replication slots this will keep enough xlog segments
      around so that mirrors have an opportunity to reconnect when a
      checkpoint removes a segment while the mirror is not streaming.
      Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
      dd18c4a0
    • H
      Mark various objects as internal, for purposes of object access hooks. · a673ddaa
      Heikki Linnakangas 提交于
      As far as I can see, the 'is_internal' flag is passed through to possible
      object access hook, but it has no other effect. Mark the LOV index and
      heap created for bitmap indexes, as well as constrains created for
      exchanged partitions as 'internal'.
      a673ddaa
  4. 26 9月, 2018 12 次提交