1. 23 9月, 2017 1 次提交
    • K
      Add a long living account for Relinquished Memory · 1822c826
      Kavinder Dhaliwal 提交于
      There are cases where during execution a Memory Intensive Operator (MI)
      may not use all the memory that is allocated to it. This means that this
      extra memory (quota - allocated) can be relinquished for other MI nodes
      to use during execution of a statement. For example
      
      ->  Hash Join
               ->  HashAggregate
               ->  Hash
      In the above query fragment the HashJoin operator has a MI operator for
      both its inner and outer subtree. If there ever is the case that the
      Hash node used much less memory than was given as its quota it will now
      call MemoryAccounting_DeclareDone() and the difference between its
      quota and allocated amount will be added to the allocated amount of the
      RelinquishedPool. Doing this will enable HashAggregate to request memory
      from this RelinquishedPool if it exhausts its quota to prevent spilling.
      
      This PR adds two new API's to the MemoryAccounting Framework
      
      MemoryAccounting_DeclareDone(): Add the difference between a memory
      account's quota and its allocated amount to the long living
      RelinquishedPool
      
      MemoryAccounting_RequestQuotaIncrease(): Retrieve all relinquished
      memory by incrementing an operator's operatorMemKb and setting the
      RelinquishedPool to 0
      
      Note: This PR introduces the facility for Hash to relinquish memory to
      the RelinquishedPool memory account and for the Agg operator
      (specifically HashAgg) to request an increase to its quota before it
      builds its hash table. This commit does not generally apply this
      paradigm to all MI operators
      Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
      Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
      1822c826
  2. 22 9月, 2017 1 次提交
    • K
      Enable ORCA to be tracked by Mem Accounting · 669dd279
      Kavinder Dhaliwal 提交于
      Before this commit all memory allocations made by ORCA/GPOS were a
      blackbox to GPDB. However the ground work had been in place to allow
      GPDB's Memory Accounting Framework to track memory consumption by ORCA.
      This commit introduces two new functions
      Ext_OptimizerAlloc and Ext_OptimizerFree which
      pass through their parameters to gp_malloc and gp_free and do some bookeeping
      against the Optimizer Memory Account. This introduces very little
      overhead to the GPOS memory management framework.
      Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
      Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
      669dd279
  3. 21 9月, 2017 3 次提交
    • N
      Refactor resource group source code, part 2. · a2cf9bdf
      Ning Yu 提交于
      * resgroup: provide helper funcs for memory usage updates.
      
      We used to have complex and duplicate logic to update group & slot
      memory usage under different context, now we provide two helper
      functions to increase or decrease memory usage in group and slot.
      
      Two bad named functions `attachToSlot()` and `detachFromSlot()` are
      retired now.
      
      * resgroup: provide helper function to unassign a dropped resgroup.
      
      * resgroup: move complex checks into helper functions.
      
      Many helper functions were added with descriptive names to increase
      readability of lots of complex checks.
      
      Also added a pointer to resource group slot in self.
      
      * resgroup: add helper functions for wait queue operations.
      a2cf9bdf
    • A
      Make gp_replication.conf for USE_SEGWALREP only. · b7ce6930
      Ashwin Agrawal 提交于
      The intend of this extra configuration file is to control the
      synchronization between primary and mirror for WALREP.
      
      The gp_replication.conf is not designed to work with filerep, for
      example, the scripts like gp_expand will fail since it directly modify
      the configuration files instead of going through initdb.
      Signed-off-by: NXin Zhang <xzhang@pivotal.io>
      b7ce6930
    • H
      Add support for CREATE FUNCTION EXECUTE ON [MASTER | ALL SEGMENTS] · aa148d2a
      Heikki Linnakangas 提交于
      We already had a hack for the EXECUTE ON ALL SEGMENTS case, by setting
      prodataaccess='s'. This exposes the functionality to users via DDL, and adds
      support for the EXECUTE ON MASTER case.
      
      There was discussion on gpdb-dev about also supporting ON MASTER AND ALL
      SEGMENTS, but that is not implemented yet. There is no handy "locus" in the
      planner to represent that. There was also discussion about making a
      gp_segment_id column implicitly available for functions, but that is also
      not implemented yet.
      
      The old behavior was that a function that if a function was marked as
      IMMUTABLE, it could be executed anywhere. Otherwise it was always executed
      on the master. For backwards-compatibility, this keeps that behavior for
      EXECUTE ON ANY (the default), so even if a function is marked as EXECUTE ON
      ANY, it will always be executed on the master unless it's IMMUTABLE.
      
      There is no support for these new options in ORCA. Using any ON MASTER or
      ON ALL SEGMENTS functions in a query cause ORCA to fall back. This is the
      same as with the prodataaccess='s' hack that this replaces, but now that it
      is more user-visible, it would be nice to teach ORCA about it.
      
      The new options are only supported for set-returning functions, because for
      a regular function marked as EXECUTE ON ALL SEGMENTS, it's not clear how
      the results should be combined. ON MASTER would probably be doable, but
      there's no need for that right now, so punt.
      
      Another restriction is that a function with ON ALL SEGMENTS or ON MASTER can
      only be used in the FROM clause, or in the target list of a simple SELECT
      with no FROM clause. So "SELECT func()" is accepted, but "SELECT func() FROM
      foo" is not. "SELECT * FROM func(), foo" works, however. EXECUTE ON ANY
      functions, which is the default, work the same as before.
      aa148d2a
  4. 20 9月, 2017 5 次提交
    • P
      Dump more detailed info for memory usage in gp_resgroup_status · 2816fe67
      Pengzhou Tang 提交于
      In this commit, we add more detailed memory metrics to the 'memory_usage'
      column of gp_resgroup_status include current/available memory usage in
      a group, current/available memory usage for a slot, current/available
      memory usage for the shared part.
      2816fe67
    • G
      resource group: refine ResGroupSlotAcquire · 4646bbc6
      Gang Xiong 提交于
      Previously, waiters waiting on a dropped resource group need to be
      reassigned to a new group, to achieve it, ResGroupSlotAcquire is
      modified to be complicated and not easy to understand, this commit
      refines it.
      
      Author: Gang Xiong <gxiong@pivotal.io>
      4646bbc6
    • P
      resgroup: Allow concurrency to be zero. · 77007ff6
      Pengzhou Tang 提交于
      Allow CREATE RESOURCE GROUP and ALTER RESOURCE GROUP to set concurrency
      to 0, so there will eventually be no running queries after some time, so
      the resource group can be dropped. On drop all pending queries will be
      moved to the new resource group assigned to the role; but if the role is
      also dropped the pending queries will all be canceled. Another thing is
      we do not allow setting concurrency of admin group to zero, superuser is
      under admin group and only superuser can alter resource group, so once
      concurrency of admin group is set to zero, there will be no chance to set
      it again.
      Signed-off-by: NNing Yu <nyu@pivotal.io>
      77007ff6
    • R
      Remove the restriction on sum of memory_spill_ratio and memory_shared_quota. · c5a5780a
      Richard Guo 提交于
      This commit does two changes:
      1. Remove the restriction that sum of memory_spill_ratio and memory_shared_quota
      must be no larger than 100.
      2. Change the range of memory_spill_ratio to be [0, 100].
      c5a5780a
    • H
      Fix warning of passing const to non-const parameter. · f4417c50
      Hubert Zhang 提交于
      Function FaultInjectorIdentifierStringToEnum(faultName) pass a const
      string to a non-const parameter, which cause a build warnig. But on the
      second thought, we have supported injecting fault by fault name without
      corresponding fault identifier, so it's better to use faultname instead
      of fault enum identifier in the ereport.
      f4417c50
  5. 19 9月, 2017 3 次提交
  6. 18 9月, 2017 1 次提交
    • H
      Using fault name instead of enum as the key of fault hash table (#3249) · 4616d3ec
      Huan Zhang 提交于
      Using fault name instead of enum as the key of fault hash table
      
      GPDB fault injector uses fault enum as the key of fault hash table.
      If someone wants to inject fault into gpdb extensions(a separate repo),
      she has to hard code the extension related fault enums into gpdb core
      code, this is not a good practice.
      So we simply use fault name as the hash key to remove the need of hard
      code the fault enum. Note that fault injector API doesn't change.
      4616d3ec
  7. 17 9月, 2017 1 次提交
    • H
      Convert WindowFrame to frameOptions + start + end · ebf9763c
      Heikki Linnakangas 提交于
      In GPDB, we have so far used a WindowFrame struct to represent the start
      and end window bound, in a ROWS/RANGE BETWEEN clause, while PostgreSQL
      uses the combination of  a frameOptions bitmask and start and end
      expressions. Refactor to replace the WindowFrame with the upstream
      representation.
      ebf9763c
  8. 16 9月, 2017 1 次提交
    • K
      Remove function isMemoryIntensiveFunction · 5c9b81ef
      Kavinder Dhaliwal 提交于
      Historically this function was used to special case a few operators that
      were not considered to be MemoryIntensive. However, now it always
      returns true. This commit removes the function and also moves the case
      for T_FunctionScan in IsMemoryIntensiveOperator into the group that
      always returns true, as this is its current behavior
      5c9b81ef
  9. 15 9月, 2017 2 次提交
    • H
      Make it possible to build without libbz2, also on non-Windows. · d6749c3c
      Heikki Linnakangas 提交于
      The bzip2 library is only used by the gfile/fstream code, used for external
      tables and gpfdist. The usage of bzip2 was in #ifndef WIN32 blocks, so it
      was only built on non-Windows systems.
      
      Instead of tying it to the platform, use a proper autoconf check and
      HAVE_LIBBZ2 flags. This makes it possible to build gpfdist with bzip2
      support on Windows, as well as building without bzip2 on non-Windows
      systems. That makes it easier to test the otherwise Windows-only codepaths
      on other platforms. --with-libbz2 is still the default, but you can now use
      --without-libbz2 if you wish.
      
      I'm sure that some regression tests will fail if you actually build the
      server without libbz2, but I'm not going to address that right now. We have
      similar problems with other features that are in principle optional, but
      cause some regression tests to fail.
      
      Also use "#ifdef HAVE_LIBZ" rather than "#ifndef WIN32" to enable/disable
      zlib support in gpfdist. Building the server still fails if you use
      --without-zlib, but at least you can build the client programs without
      zlib, also on non-Windows systems.
      
      Remove obsolete copy of bzlib.h from the repository while we're at it.
      d6749c3c
    • A
      Remove gp_fault_strategy catalog table and corresponding code. · f5b5c218
      Ashwin Agrawal 提交于
      Using gp_segment_configuration catalog table easily can find if mirrors exist or
      not, do not need special table to communicate the same. Earlier
      gp_fault_strategy used to convey 'n' for mirrorless system, 'f' for replication
      and 's' for san mirrors. Since support for 's' was removed in 5.0 only purpose
      gp_fault_strategy served was mirrored or not mirrored system. Hence deleting the
      gp_fault_strategy table and at required places using gp_segment_configuration to
      find the required info.
      f5b5c218
  10. 14 9月, 2017 3 次提交
    • H
      Remove unused ENABLE_LTRACE code. · d994b38e
      Heikki Linnakangas 提交于
      Although I'm not too familiar with SystemTap, I'm pretty sure that recent
      versions can do user space tracing better. I don't think anyone is using
      these hacks anymore, so remove them.
      d994b38e
    • N
      Refactor resource group source code. · d145bd11
      Ning Yu 提交于
      * resgroup: move MyResGroupSharedInfo into MyResGroupProcInfo.
      
      MyResGroupSharedInfo is now replaced with MyResGroupProcInfo->group.
      
      * resgroup: retire resGranted in PGPROC.
      
      when resGranted == false we must have resSlotId == InvalidSlotId,
      when resGranted != false we must have resSlotId != InvalidSlotId,
      so we can retire resGranted and keep only resSlotId.
      
      * resgroup: rename sharedInfo to group.
      
      in resgroup.c there used to be both `group` and `sharedInfo` for the
      same thing, now only use `group`.
      
      * resgroup: rename MyResGroupProcInfo to self.
      
      We want to use this variable directly so a short name is better.
      d145bd11
    • H
      Allow "agg() OVER window" syntax, without parens. · 428b6cde
      Heikki Linnakangas 提交于
      This is the spec-compliant spelling, but GPDB has only allowed
      "agg OVER (window)" so far. With this commit, the parens are still allowed,
      for backwards-compatibility.
      
      Change deparsing code to also use the non-parens syntax in view definitions
      and EXPLAIN. Adjust expected output of regression tests accordingly.
      428b6cde
  11. 13 9月, 2017 1 次提交
    • J
      Fix assign_wal_consistency_checking function · 357f8d22
      Jimmy Yih 提交于
      It was noticed in CentOS 6 that the entries in the char* list would
      become blank after freeing the raw string that is obtained from
      setting the GUC wal_consistency_checking. We should free this string
      only after we finish using it.
      
      Authors: Jimmy Yih and Taylor Vesely
      357f8d22
  12. 12 9月, 2017 4 次提交
    • P
      Make gp_resgroup_status work with concurrently resgroup drops. · 5893ca3c
      Pengzhou Tang 提交于
      The view gp_toolkit.gp_resgroup_status collect resgroup status
      information on both QD and QEs, also before and after a 300ms delay to
      measure the cpu usage. When there is concurrently resgroup drops it
      might fail due to missing resgroups.
      
      This is fixed by holding ExclusiveLock which blocks the drop operations.
      Signed-off-by: NNing Yu <nyu@pivotal.io>
      5893ca3c
    • P
      Set resWaiting under ResGroupLock to avoid race condition · 88233315
      Pengzhou Tang 提交于
      Previously in ResGroupSlotRelease(), we mark the resWaiting flag to
      false out of the ResGroupLock, a race condition can happen as that
      the waitProc that meant to be wake up is canceled just before it's
      resWaiting is set to false and in the window resWaiting is set, the
      waitProc run another query and wait for a free slot again, then
      resWaiting is set and the waitProc will get another unexpected free slot.
      
      eg: rg1 has concurrency = 1
      
      s1: BEGIN;
      s2: BEGIN; -- blocked
      s1: END; -- set breakpoint just before resWaiting is set in ResGroupSlotRelease()
      s2: cancel it;
      s3: BEGIN; -- will get a free slot.
      s2: BEGIN; -- run begin again and be blocked.
      s1: continue;
      s2: s2 will got unexpected free slot
      
      To avoid leak of resgroup slot, we set resWaiting under ResGroupLock so the
      waitProc have no window to re-fetch the slot before resWaiting is set.
      It's hard to add a regression test, adding breakpoint in ResGroupSlotRelease
      may block all other operations because it holds the ResGroupLock.
      88233315
    • H
      Move GPDB-specific GetTupleVisibilitySummary functions to separate file. · ce1e8159
      Heikki Linnakangas 提交于
      tqual.c is quite a long file, and there are plenty of legitimate
      MPP-specific changes in it compared to upstream. Move these functions to
      separate file, to reduce our diff footprint. This hopefully makes merging
      easier in the future.
      
      Run pgindent over the new file, and do some manual tidying up. Also, use
      CStringGetTextDatum() instead of the more complicated dance with
      DirectFunctionCall and textin(), to convert C strings into text Datums.
      ce1e8159
    • H
      Split WindowSpec into separate before and after parse-analysis structs. · 789f443d
      Heikki Linnakangas 提交于
      In the upstream, two different structs are used to represent a window
      definition. WindowDef in the grammar, which is transformed into
      WindowClause during parse analysis. In GPDB, we've been using the same
      struct, WindowSpec, in both stages. Split it up, to match the upstream.
      
      The representation of the window frame, i.e. "ROWS/RANGE BETWEEN ..." was
      different between the upstream implementation and the GPDB one. We now use
      the upstream frameOptions+startOffset+endOffset representation in raw
      WindowDef parse node, but it's still converted to the WindowFrame
      representation for the later stages, so WindowClause still uses that. I
      will switch over the rest of the codebase to the upstream representation as
      a separate patch.
      
      Also, refactor WINDOW clause deparsing to be closer to upstream.
      
      One notable difference is that the old WindowSpec.winspec field corresponds
      to the winref field in WindowDef andWindowClause, except that the new
      'winref' is 1-based, while the old field was 0-based.
      
      Another noteworthy thing is that this forbids specifying "OVER (w
      ROWS/RANGE BETWEEN ...", if the window "w" already specified a window frame,
      i.e. a different ROWS/RANGE BETWEEN. There was one such case in the
      regression suite, in window_views, and this updates the expected output of
      that to be an error.
      789f443d
  13. 09 9月, 2017 1 次提交
  14. 08 9月, 2017 4 次提交
  15. 07 9月, 2017 8 次提交
    • H
      Remove remnants of "EXCLUDE [CURRENT ROW|GROUP|TIES|NO OTHERS]" syntax. · 646cdc60
      Heikki Linnakangas 提交于
      It hasn't been implemented, but there is basic support in the grammar,
      just enough to detect the syntax and throw an error or ignore it. All the
      rest was dead code.
      646cdc60
    • H
      Fix a compilation error and some warnings introduced by the recursive CTEs. · 7cb69995
      Heikki Linnakangas 提交于
      * In ruleutils.c, the ereport() was broken. Use elog() instead, like in
        the upstream. (elog() is fine for "can't happen" kind of sanity checks)
      
      * Remove a few unused local variables.
      
      * Add a missing cast from Plan * to Node *.
      7cb69995
    • J
      Un-hide recursive CTE on master [#150861534] · 20152cbf
      Jesse Zhang 提交于
      We will be less conservative and enable by default recursive CTE on
      master, while keeping recursive CTE hidden as we progress on developing
      the feature.
      
      This reverts the following two commits:
      * 280c577a "Set gp_recursive_cte_prototype GUC to true in test"
      * 4d5f8087 "Guard Recursive CTE behind a GUC"
      Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>
      Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
      20152cbf
    • J
      Evaluate lesser joins to produce best join tree · 6ad94ff2
      Jemish Patel 提交于
      Previously we were setting the value of
      `optimizer_join_arity_for_associativity_commutativity` to very large
      number and so ORCA would spend a very long time evaluating all possible n_way_join
      combinations to come up with the cheapest join tree to use in the plan.
      
      We are reducing this value to `7` as it does not prove to be beneficial
      to spend time and resources to evaluate any more than 7_way_joins in
      trying to find the cheapest join tree.
      6ad94ff2
    • J
      Set gp_recursive_cte_prototype GUC to true in test · 280c577a
      Jesse Zhang 提交于
      Plus minor corrections in spelling and comments.
      Signed-off-by: NSam Dash <sdash@pivotal.io>
      280c577a
    • K
      Guard Recursive CTE behind a GUC · 4d5f8087
      Kavinder Dhaliwal 提交于
      While Recurisve CTE is still being developed it will be hidden from users by
      the guc gp_recursive_cte_prototype
      Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
      4d5f8087
    • T
      Fix up ruleutils.c for CTE features. The main problem was that · 8e4b2f67
      Tom Lane 提交于
      get_name_for_var_field didn't have enough context to interpret a reference to
      a CTE query's output.  Fixing this requires separate hacks for the regular
      deparse case (pg_get_ruledef) and for the EXPLAIN case, since the available
      context information is quite different.  It's pretty nearly parallel to the
      existing code for SUBQUERY RTEs, though.  Also, add code to make sure we
      qualify a relation name that matches a CTE name; else the CTE will mistakenly
      capture the reference when reloading the rule.
      
      In passing, fix a pre-existing problem with get_name_for_var_field not working
      on variables in targetlists of SubqueryScan plan nodes.  Although latent all
      along, this wasn't a problem until we made EXPLAIN VERBOSE try to print
      targetlists.  To do this, refactor the deparse_context_for_plan API so that
      the special case for SubqueryScan is all on ruleutils.c's side.
      
      (cherry picked from commit 742fd06d)
      8e4b2f67
    • H
      Bring in recursive CTE to GPDB · fd61a4ca
      Haisheng Yuan 提交于
      Planner generates plan that doesn't insert any motion between WorkTableScan and
      its corresponding RecursiveUnion, because currently in GPDB motions are not
      rescannable. For example, a MPP plan for recursive CTE query may look like:
      ```
      Gather Motion 3:1
         ->  Recursive Union
               ->  Seq Scan on department
                     Filter: name = 'A'::text
               ->  Nested Loop
                     Join Filter: d.parent_department = sd.id
                     ->  WorkTable Scan on subdepartment sd
                     ->  Materialize
                           ->  Broadcast Motion 3:3
                                 ->  Seq Scan on department d
      ```
      
      For the current solution, the WorkTableScan is always put on the outer side of
      the top most Join (the recursive part of RecusiveUnion), so that we can safely
      rescan the inner child of join without worrying about the materialization of a
      potential underlying motion. This is a heuristic based plan, not a cost based
      plan.
      
      Ideally, the WorkTableScan can be placed on either side of the join with any
      depth, and the plan should be chosen based on the cost of the recursive plan
      and the number of recursions. But we will leave it for later work.
      
      Note: The hash join is temporarily disabled for plan generation of recursive
      part, because if the hash table spills, the batch file is going to be removed
      as it executes. We have a following story to enable spilled hash table to be
      rescannable.
      
      See discussion at gpdb-dev mailing list:
      https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/s_SoXKlwd6I
      fd61a4ca
  16. 06 9月, 2017 1 次提交
    • H
      Ensure that stable functions in a prepared statement are re-evaluated. · ccca0af2
      Heikki Linnakangas 提交于
      If a prepared statement, or a cached plan for an SPI query e.g. from a
      PL/pgSQL function, contains stable functions, the stable functions were
      incorrectly evaluated only once at plan time, instead of on every execution
      of the plan. This happened to not be a problem in queries that contain any
      parameters, because in GPDB, they are re-planned on every invocation
      anyway, but non-parameter queries were broken.
      
      In the planner, before this commit, when simplifying expressions, we set
      the transform_stable_funcs flag to true for every query, and evaluated all
      stable functions at planning time. Change it to false, and also rename it
      back to 'estimate', as it's called in the upstream. That flag was changed
      back in 2010, in order to allow partition pruning to work with qual
      containing stable functions, like TO_DATE. I think back then, we always
      re-planned every query, so that was OK, but we do cache plans now.
      
      To avoid regressing to worse plans, change eval_const_expressions() so that
      it still does evaluate stable functions, even when the 'estimate' flag is
      off. But when it does so, mark the plan as "one-off", meaning that it must
      be re-planned on every execution. That gives the old, intended, behavior,
      that such plans are indeed re-planned, but it still allows plans that don't
      use stable functions to be cached.
      
      This seems to fix github issue #2661. Looking at the direct dispatch code
      in apply_motion(), I suspect there are more issues like this lurking there.
      There's a call to planner_make_plan_constant(), modifying the target list
      in place, and that happens during planning. But this at least fixes the
      non-direct dispatch cases, and is a necessary step for fixing any remaining
      issues.
      
      For some reason, the query now gets planned *twice* for every invocation.
      That's not ideal, but it was an existing issue for prepared statements with
      parameters, already. So let's deal with that separately.
      ccca0af2