1. 23 9月, 2017 3 次提交
    • K
      Add a long living account for Relinquished Memory · 1822c826
      Kavinder Dhaliwal 提交于
      There are cases where during execution a Memory Intensive Operator (MI)
      may not use all the memory that is allocated to it. This means that this
      extra memory (quota - allocated) can be relinquished for other MI nodes
      to use during execution of a statement. For example
      
      ->  Hash Join
               ->  HashAggregate
               ->  Hash
      In the above query fragment the HashJoin operator has a MI operator for
      both its inner and outer subtree. If there ever is the case that the
      Hash node used much less memory than was given as its quota it will now
      call MemoryAccounting_DeclareDone() and the difference between its
      quota and allocated amount will be added to the allocated amount of the
      RelinquishedPool. Doing this will enable HashAggregate to request memory
      from this RelinquishedPool if it exhausts its quota to prevent spilling.
      
      This PR adds two new API's to the MemoryAccounting Framework
      
      MemoryAccounting_DeclareDone(): Add the difference between a memory
      account's quota and its allocated amount to the long living
      RelinquishedPool
      
      MemoryAccounting_RequestQuotaIncrease(): Retrieve all relinquished
      memory by incrementing an operator's operatorMemKb and setting the
      RelinquishedPool to 0
      
      Note: This PR introduces the facility for Hash to relinquish memory to
      the RelinquishedPool memory account and for the Agg operator
      (specifically HashAgg) to request an increase to its quota before it
      builds its hash table. This commit does not generally apply this
      paradigm to all MI operators
      Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
      Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
      1822c826
    • S
      Cherry-pick 'ae47eb1' from upstream to fix Nested CTE errors (#3360) · 009b1809
      sambitesh 提交于
      Before this cherry-pick the below query would have errored out
      
      WITH outermost(x) AS (
        SELECT 1
        UNION (WITH innermost as (SELECT 2)
               SELECT * FROM innermost
               UNION SELECT 3)
      )
      SELECT * FROM outermost;
      Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
      009b1809
    • T
      Add gp_stat_replication view · 1546ec3b
      Taylor Vesely 提交于
      In order to view the primary segments' replication stream data from
      their pg_stat_replication view, we currently need to connect to the
      primary segment individually via utility mode. To make life easier, we
      introduce a function that will fetch each primary segment's
      replication stream data and wrap it with a view named
      gp_stat_replication. It will now be possible to view all the cluster
      replication information from the master in a regular psql session.
      
      Authors: Taylor Vesely and Jimmy Yih
      1546ec3b
  2. 22 9月, 2017 2 次提交
  3. 21 9月, 2017 9 次提交
    • H
      Fix bug in handling re-scan of a hash join. · f7101d98
      Heikki Linnakangas 提交于
      The WITH RECURSIVE test case in 'join_gp' would miss some rows, if
      the hash algorithm (src/backend/access/hash/hashfunc.c) was replaced
      with the one from PostgreSQL 8.4, or if statement_mem was lowered from
      1000 kB to 700 kB. This is what happened:
      
      1. A tuple belongs to batch 0, and is kept in memory during processing
         batch 0.
      
      2. The outer scan finishes, and we spill the inner batch 0 from memory
         to a file, with SpillFirstBatch, and start processing tuple 1
      
      3. While processing batch 1, the number of batches is increased, and
         the tuple that belonged to batch 0, and was already written to the
         batch 0's file, is moved, to a later batch.
      
      4. After the first scan is complete, the hash join is re-scanned
      
      5. We reload the batch file 0 into memory. While reloading, we encounter
         the tuple that now doesn't seem to belong to batch 0, and throw it
         away.
      
      6. We perform the rest of the re-scan. We have missed any matches to the
         tuple that was thrown away. It was not part of the later batch files,
         because in the first pass, it was handled as part of batch 0. But in
         the re-scan, it was not handled as part of batch 0, because nbatch was
         now larger, so it didn't belong there.
      
      To fix, when reloading a batch file we see a tuple that actually belongs
      to a later batch file, we write it to that later file. To avoid adding
      it there multiple times, if the hash join is re-scanned multiple times,
      if any tuples are moved when reloading a batch file, destroy the batch
      file and re-create it with just the remaining tuples.
      
      This is made a bit complicated by the fact that BFZ temp files don't support
      appending to a file that's already been rewinded for reading. So what we
      actually do, is always re-create the batch file, even if there has been no
      changes to it. I left comments about that, Ideally, we would either support
      re-appending to BFZ files, or stopped using BFZ workfiles for this
      altogether (I'm not convinced they're any better than plain BufFiles). But
      that can be done later.
      
      Fixes github issue #3284
      f7101d98
    • H
      Don't double-count inner tuples reloaded from file. · 429ff8c4
      Heikki Linnakangas 提交于
      ExecHashTableInsert also increments the counter, so we don't need to do it
      here. This is harmless AFAICS, the counter isn't used for anything but
      instrumentation at the moment, but it confused me while debugging.
      429ff8c4
    • H
      Fix CURRENT OF to work with PL/pgSQL cursors. · 91411ac4
      Heikki Linnakangas 提交于
      It only worked for cursors declared with DECLARE CURSOR, before. You got
      an "there is no parameter $0" error if you tried. This moves the decision
      on whether a plan is "simply updatable", from the parser to the planner.
      Doing it in the parser was awkward, because we only want to do it for
      queries that are used in a cursor, and for SPI queries, we don't know it
      at that time yet.
      
      For some reason, the copy, out, read-functions of CurrentOfExpr were missing
      the cursor_param field. While we're at it, reorder the code to match
      upstream.
      
      This only makes the required changes to the Postgres planner. ORCA has never
      supported updatable cursors. In fact, it will fall back to the Postgres
      planner on any DECLARE CURSOR command, so that's why the existing tests
      have passed even with optimizer=off.
      91411ac4
    • H
      Remove now-unnecessary code from gp_read_error_log to dispatch the call. · 4035881e
      Heikki Linnakangas 提交于
      There was code in gp_read_error_log(), to "manually" dispatch the call to
      all the segments, if it was executed in the dispatcher. This was
      previously necessary, because even though the function was marked with
      prodataaccess='s', the planner did not guarantee that it's executed in the
      segments, when called in the targetlist like "SELECT
      gp_read_error_log('tab')". Now that we have the EXECUTE ON ALL SEGMENTS
      syntax, and are more rigorous about enforcing that in the planner, this
      hack is no longer required.
      4035881e
    • N
      Refactor resource group source code, part 2. · a2cf9bdf
      Ning Yu 提交于
      * resgroup: provide helper funcs for memory usage updates.
      
      We used to have complex and duplicate logic to update group & slot
      memory usage under different context, now we provide two helper
      functions to increase or decrease memory usage in group and slot.
      
      Two bad named functions `attachToSlot()` and `detachFromSlot()` are
      retired now.
      
      * resgroup: provide helper function to unassign a dropped resgroup.
      
      * resgroup: move complex checks into helper functions.
      
      Many helper functions were added with descriptive names to increase
      readability of lots of complex checks.
      
      Also added a pointer to resource group slot in self.
      
      * resgroup: add helper functions for wait queue operations.
      a2cf9bdf
    • A
      Make gp_replication.conf for USE_SEGWALREP only. · b7ce6930
      Ashwin Agrawal 提交于
      The intend of this extra configuration file is to control the
      synchronization between primary and mirror for WALREP.
      
      The gp_replication.conf is not designed to work with filerep, for
      example, the scripts like gp_expand will fail since it directly modify
      the configuration files instead of going through initdb.
      Signed-off-by: NXin Zhang <xzhang@pivotal.io>
      b7ce6930
    • H
      Take advantage of the new EXECUTE ON syntax in gp_toolkit. · 9a039e4f
      Heikki Linnakangas 提交于
      Also change a few regression tests to use the new syntax, instead of
      gp_toolkit's __gp_localid and __gp_masterid functions.
      9a039e4f
    • H
      Add support for CREATE FUNCTION EXECUTE ON [MASTER | ALL SEGMENTS] · aa148d2a
      Heikki Linnakangas 提交于
      We already had a hack for the EXECUTE ON ALL SEGMENTS case, by setting
      prodataaccess='s'. This exposes the functionality to users via DDL, and adds
      support for the EXECUTE ON MASTER case.
      
      There was discussion on gpdb-dev about also supporting ON MASTER AND ALL
      SEGMENTS, but that is not implemented yet. There is no handy "locus" in the
      planner to represent that. There was also discussion about making a
      gp_segment_id column implicitly available for functions, but that is also
      not implemented yet.
      
      The old behavior was that a function that if a function was marked as
      IMMUTABLE, it could be executed anywhere. Otherwise it was always executed
      on the master. For backwards-compatibility, this keeps that behavior for
      EXECUTE ON ANY (the default), so even if a function is marked as EXECUTE ON
      ANY, it will always be executed on the master unless it's IMMUTABLE.
      
      There is no support for these new options in ORCA. Using any ON MASTER or
      ON ALL SEGMENTS functions in a query cause ORCA to fall back. This is the
      same as with the prodataaccess='s' hack that this replaces, but now that it
      is more user-visible, it would be nice to teach ORCA about it.
      
      The new options are only supported for set-returning functions, because for
      a regular function marked as EXECUTE ON ALL SEGMENTS, it's not clear how
      the results should be combined. ON MASTER would probably be doable, but
      there's no need for that right now, so punt.
      
      Another restriction is that a function with ON ALL SEGMENTS or ON MASTER can
      only be used in the FROM clause, or in the target list of a simple SELECT
      with no FROM clause. So "SELECT func()" is accepted, but "SELECT func() FROM
      foo" is not. "SELECT * FROM func(), foo" works, however. EXECUTE ON ANY
      functions, which is the default, work the same as before.
      aa148d2a
    • B
      Fix multistage aggregation plan targetlists · 41640e69
      Bhuvnesh Chaudhary 提交于
      If there are aggregation queries with aliases same as the table actual
      columns and they are propagated further from subqueries and grouping is
      applied on the column alias it may result in inconsistent targetlists
      for aggregation plan causing crash.
      
      	CREATE TABLE t1 (a int) DISTRIBUTED RANDOMLY;
      	SELECT substr(a, 2) as a
      	FROM
      		(SELECT ('-'||a)::varchar as a
      			FROM (SELECT a FROM t1) t2
      		) t3
      	GROUP BY a;
      41640e69
  4. 20 9月, 2017 6 次提交
    • P
      Dump more detailed info for memory usage in gp_resgroup_status · 2816fe67
      Pengzhou Tang 提交于
      In this commit, we add more detailed memory metrics to the 'memory_usage'
      column of gp_resgroup_status include current/available memory usage in
      a group, current/available memory usage for a slot, current/available
      memory usage for the shared part.
      2816fe67
    • G
      resource group: refine ResGroupSlotAcquire · 4646bbc6
      Gang Xiong 提交于
      Previously, waiters waiting on a dropped resource group need to be
      reassigned to a new group, to achieve it, ResGroupSlotAcquire is
      modified to be complicated and not easy to understand, this commit
      refines it.
      
      Author: Gang Xiong <gxiong@pivotal.io>
      4646bbc6
    • P
      resgroup: Allow concurrency to be zero. · 77007ff6
      Pengzhou Tang 提交于
      Allow CREATE RESOURCE GROUP and ALTER RESOURCE GROUP to set concurrency
      to 0, so there will eventually be no running queries after some time, so
      the resource group can be dropped. On drop all pending queries will be
      moved to the new resource group assigned to the role; but if the role is
      also dropped the pending queries will all be canceled. Another thing is
      we do not allow setting concurrency of admin group to zero, superuser is
      under admin group and only superuser can alter resource group, so once
      concurrency of admin group is set to zero, there will be no chance to set
      it again.
      Signed-off-by: NNing Yu <nyu@pivotal.io>
      77007ff6
    • M
      Report error when 'COPY (SELECT ...) TO' with 'ON SEGMENT' · cbddcc86
      Ming LI 提交于
      Because we don't know the data location of the result of SELECT query,
      ON SEGMENT is forbidden.
      cbddcc86
    • R
      Remove the restriction on sum of memory_spill_ratio and memory_shared_quota. · c5a5780a
      Richard Guo 提交于
      This commit does two changes:
      1. Remove the restriction that sum of memory_spill_ratio and memory_shared_quota
      must be no larger than 100.
      2. Change the range of memory_spill_ratio to be [0, 100].
      c5a5780a
    • H
      Fix warning of passing const to non-const parameter. · f4417c50
      Hubert Zhang 提交于
      Function FaultInjectorIdentifierStringToEnum(faultName) pass a const
      string to a non-const parameter, which cause a build warnig. But on the
      second thought, we have supported injecting fault by fault name without
      corresponding fault identifier, so it's better to use faultname instead
      of fault enum identifier in the ereport.
      f4417c50
  5. 19 9月, 2017 4 次提交
  6. 18 9月, 2017 2 次提交
    • H
      Add sanity checks for unrecognized window frame options. · c7c158dd
      Heikki Linnakangas 提交于
      These shouldn't happen, but Coverity warned about these. GCC would also
      complain, but I've been compiling with -Wno-maybe-uninitialized lately,
      because of noise.
      
      Actually, this isn't quite enough; ORCA also needs to mark GPOS_RAISE
      with the "noreturn" attribute, so that the compiler gets the hint.
      Opened https://github.com/greenplum-db/gporca/pull/234 about that.
      c7c158dd
    • H
      Using fault name instead of enum as the key of fault hash table (#3249) · 4616d3ec
      Huan Zhang 提交于
      Using fault name instead of enum as the key of fault hash table
      
      GPDB fault injector uses fault enum as the key of fault hash table.
      If someone wants to inject fault into gpdb extensions(a separate repo),
      she has to hard code the extension related fault enums into gpdb core
      code, this is not a good practice.
      So we simply use fault name as the hash key to remove the need of hard
      code the fault enum. Note that fault injector API doesn't change.
      4616d3ec
  7. 17 9月, 2017 2 次提交
    • H
      Convert WindowFrame to frameOptions + start + end · ebf9763c
      Heikki Linnakangas 提交于
      In GPDB, we have so far used a WindowFrame struct to represent the start
      and end window bound, in a ROWS/RANGE BETWEEN clause, while PostgreSQL
      uses the combination of  a frameOptions bitmask and start and end
      expressions. Refactor to replace the WindowFrame with the upstream
      representation.
      ebf9763c
    • H
      Hardcode the "frame maker" function for LEAD and LAG. · 686aab95
      Heikki Linnakangas 提交于
      This removes pg_window.winframemakerfunc column. It was only used for
      LEAD/LAG, and only in the Postgres planner. Hardcode the same special
      handling for LEAD/LAG in planwindow.c instead, based on winkind.
      
      This is one step in refactoring the planner and executor further, to
      replace the GPDB implementation of window functions with the upstream
      one.
      686aab95
  8. 16 9月, 2017 3 次提交
    • H
      Fix check for superuser_reserved_connections. · 06ea112c
      Heikki Linnakangas 提交于
      Upstream uses >= here. It was changed in GPDB, to use > instead of >=. but
      I don't see how that's more correct or better. I tracked that change in
      the old pre-open-sourcing repository to this commit:
      
      commit f3e98a1ef5fc5915662077b137c563371ea1c0a4
      Date: Mon Apr 6 15:04:33 2009 -0800
      
         Fixed guc check for ReservedBackends.
      
         [git-p4: depot-paths = "//cdb2/main/": change = 33269]
      
      So, there was no explanation there either, what the alleged problem was.
      06ea112c
    • H
      Fix CREATE TABLE AS VALUES ... DISTRIBUTED BY · 47936ab2
      Heikki Linnakangas 提交于
      Should call setQryDistributionPolicy() after applyColumnNames(), otherwise
      the column names specified in the CREATE TABLE cannot be used in the
      DISTRIBUTED BY clause. Add test case.
      
      Fixes github issue #3285.
      47936ab2
    • K
      Remove function isMemoryIntensiveFunction · 5c9b81ef
      Kavinder Dhaliwal 提交于
      Historically this function was used to special case a few operators that
      were not considered to be MemoryIntensive. However, now it always
      returns true. This commit removes the function and also moves the case
      for T_FunctionScan in IsMemoryIntensiveOperator into the group that
      always returns true, as this is its current behavior
      5c9b81ef
  9. 15 9月, 2017 9 次提交
    • H
      Make it possible to build without libbz2, also on non-Windows. · d6749c3c
      Heikki Linnakangas 提交于
      The bzip2 library is only used by the gfile/fstream code, used for external
      tables and gpfdist. The usage of bzip2 was in #ifndef WIN32 blocks, so it
      was only built on non-Windows systems.
      
      Instead of tying it to the platform, use a proper autoconf check and
      HAVE_LIBBZ2 flags. This makes it possible to build gpfdist with bzip2
      support on Windows, as well as building without bzip2 on non-Windows
      systems. That makes it easier to test the otherwise Windows-only codepaths
      on other platforms. --with-libbz2 is still the default, but you can now use
      --without-libbz2 if you wish.
      
      I'm sure that some regression tests will fail if you actually build the
      server without libbz2, but I'm not going to address that right now. We have
      similar problems with other features that are in principle optional, but
      cause some regression tests to fail.
      
      Also use "#ifdef HAVE_LIBZ" rather than "#ifndef WIN32" to enable/disable
      zlib support in gpfdist. Building the server still fails if you use
      --without-zlib, but at least you can build the client programs without
      zlib, also on non-Windows systems.
      
      Remove obsolete copy of bzlib.h from the repository while we're at it.
      d6749c3c
    • H
      Fix stanullfrac computation on column with all-wide values. · 90bcf3fd
      Heikki Linnakangas 提交于
      If a the sample of a column consists entirely of "too wide" values, which
      are left out of the sample when it's passed to the compute_stats function,
      we pass an empty sample to it. The default compute_stats gets confused by
      that, and computes the null fraction as 0 / 0 = NaN, so we end up storing
      NaN as stanullfrac.
      
      If all the values in the sample are wide values, then they're surely not
      NULLs, so the right thing to do is to store stanullfrac = 0. That is a
      bit non-linear with the normal compute_stats function, which effectively
      treats too wide values as not existing at all, which artificially inflates
      the null fraction. Another non-linear thing is that we store stawidth=1024
      in this special case, but the normal computation again ignores the wide
      values in computing stawidth. If we wanted to do something about that, we
      should adjust the normal computation to take those wide values better into
      account, but that's a different story, and now we at least won't store NaN
      in stanullfrac any longer.
      
      Fixes github issue #3259.
      90bcf3fd
    • H
      Stop supporting SQL type aliases in ALTER TYPE SET DEFAULT ENCODING. · b4f125bd
      Heikki Linnakangas 提交于
      This is a bit unfortunate, in case someone is using them. But as it
      happens, we haven't even mentioned the ALTER TYPE SET DEFAULT ENCODING
      command in the documentation, so there probably aren't many people using
      them, and you can achieve the same thing by using the normal, non-alias,
      names like "varchar" instead of "character varying".
      b4f125bd
    • H
      Move extended grouping processing to after transforming window functions. · 0b246cec
      Heikki Linnakangas 提交于
      This way we don't need the weird half-transformation of WindowDefs. Makes
      things simpler.
      0b246cec
    • H
      Fix remaining equal* functions to not compare 'location' field. · c6613ddd
      Heikki Linnakangas 提交于
      The 'location' field is just to give better error messages. It should not
      be considered when testing whether two nodes are equal. (Note that the
      COMPARE_LOCATION_FIELD() macro that we now consistently use on the
      'location' field is a no-op.)
      
      I noticed this while working on a patch that would compare two ColumnRefs
      to see if they are equal, and could be collapsed to one.
      c6613ddd
    • H
      Rewrite the way a DTM initialization error is logged, to retain file & lineno. · c6f931fe
      Heikki Linnakangas 提交于
      While working on the 8.4 merge, I had a bug that tripped an Insist inside
      the PG_TRY-CATCH. That was very difficult to track down, because the way
      the error is logged here. Using ereport() includes filename and line
      number where it's re-emitted, not the original place. So all I got was
      "Unexpected internal error" in the log, with meaningless filename & lineno.
      
      This rewrites the way the error is reported so that it preserves the
      original filename and line number. It will also use the original error
      level and will preserve all the other fields.
      c6f931fe
    • M
      Fixed crash at copy report unexpected message type error · c7a382c6
      Ming LI 提交于
      c7a382c6
    • Z
      Fix Bug: spi_execute assert fail when there's no query mem · 5d6447ae
      Zhenghua Lyu 提交于
      The user can config resgroup to make some query's query memory is zero.
      In such cases, it will use work memory. And since query_mem's type is uint64,
      we simply remove the assert in spi execution's code.
      5d6447ae
    • A
      Remove gp_fault_strategy catalog table and corresponding code. · f5b5c218
      Ashwin Agrawal 提交于
      Using gp_segment_configuration catalog table easily can find if mirrors exist or
      not, do not need special table to communicate the same. Earlier
      gp_fault_strategy used to convey 'n' for mirrorless system, 'f' for replication
      and 's' for san mirrors. Since support for 's' was removed in 5.0 only purpose
      gp_fault_strategy served was mirrored or not mirrored system. Hence deleting the
      gp_fault_strategy table and at required places using gp_segment_configuration to
      find the required info.
      f5b5c218