1. 24 11月, 2017 1 次提交
  2. 21 11月, 2017 1 次提交
    • B
      Supporting Join Optimization Levels in GPORCA · c8192690
      Bhuvnesh Chaudhary 提交于
      The concept of optimization levels is known in many enterprise
      optimizers. It enables user to handle the degree of optimization that is
      being employed. The optimization levels allow the grouping of
      transformations into bags of rules (where each is assigned a particular
      level). By default all rules are applied, but if a user wants to apply
      fewer rules they are able to. This decision is made by them based on
      domain knowledge, and they know even with fewer rules being applied the
      plan generated satisfies their needs.
      
      The Cascade optimizer, on which GPORCA is based on, allows grouping of
      transformation rules into optimization levels. This concept of
      optimization levels has also been extended to join ordering allowing
      user to pick the join order via the query or, use greedy approach or use
      exhaustive approach.
      
      Postgres based planners use join_limit and from_limit to reduce the
      search space. While the objective of Optimization/Join is also to reduce
      search space, but the way it does it is different. It is requesting the
      optimizer to apply or not apply a subset of rules and providing more
      flexibility to the customer. This is one of the most frequently
      requested feature from our enterprise clients who have high degree of
      domain knowledge.
      
      This PR introduces this concept. In the immediate future we are planning
      to add different polynomial join ordering techniques with guaranteed
      bound as part of the "Greedy" search.
      Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>
      c8192690
  3. 16 11月, 2017 1 次提交
    • D
      Remove hash partitioning support · 152d1223
      Daniel Gustafsson 提交于
      Hash partitioning was never fully implemented, and was never turned
      on by default. There has been no effort to complete the feature, so
      rather than carrying dead code this removes all support for hash
      partitioning. Should we ever want this feature, we will most likely
      start from scratch anyways.
      
      As an effect from removing the unsupported MERGE/MODIFY commands,
      this previously accepted query is no longer legal:
      
      	create table t (a int, b int)
      	distributed by (a)
      	partition by range (b) (start () end(2));
      
      The syntax was an effect of an incorrect rule in the parser which
      made the start boundary optional for CREATE TABLE when it was only
      intended for MODIFY PARTITION.
      
      pg_upgrade was already checking for hash partitions so no new check
      was required (upgrade would've been impossible anyways due to hash
      algorithm change).
      152d1223
  4. 31 10月, 2017 1 次提交
  5. 30 10月, 2017 1 次提交
    • N
      Fix a resgroup performance issue. · 0b85b9d0
      Ning Yu 提交于
      On low end system with 1~2 cpu cores the new queries in a cold resgroup
      can suffer from a high latency when the overall load is very high.
      
      The root cause is that we used to set very high cpu priority for gpdb
      cgroups, so non gpdb process are scheduled with very low priority and
      high latency. GPDB processes are also affected by this because
      postmaster and other auxiliary are not put into gpdb cgroups. Even for
      QD and QEs they are not put into a gpdb cgroup until their transaction
      is began.
      
      To fix this we made below changes:
      * put postmaster and all its children processes into the toplevel
        gpdb cgroup;
      * provide a GUC to control the cgroup cpu priority for gpdb processes
        when resgroup is enabled;
      * set a lower cpu priority by default;
      0b85b9d0
  6. 14 10月, 2017 1 次提交
  7. 10 10月, 2017 1 次提交
    • H
      Hide the two tuplesort implementations behind a common facade. · bbf40a8c
      Heikki Linnakangas 提交于
      We have two implementations of tuplesort: the "regular" one inherited
      from upstream, in tuplesort.c, and a GPDB-specific tuplesort_mk.c. We had
      modified all the callers to check the gp_enable_mk_sort GUC, and deal with
      both of them. However, that makes merging with upstream difficult, and
      litters the code with the boilerplate to check the GUC and call one of
      the two implementations.
      
      Simplify the callers, by providing a single API that hides the two
      implementations from the rest of the system. The API is the tuplesort_*
      functions, as in upstream. This requires some preprocessor trickery,
      so that tuplesort.c can use the tuplesort_* function names as is, but in
      the rest of the codebase, calling tuplesort_*() will call a "switcheroo"
      function that decides which implementation to actually call. While this
      is more lines of code overall, it keeps all the ugliness confined in
      tuplesort.h, not littered throughout the codebase.
      bbf40a8c
  8. 09 10月, 2017 1 次提交
    • R
      Decouple GUC max_resource_groups and max_connections. · cbd23ea2
      Richard Guo 提交于
      Previously there is a restriction on GUC 'max_resource_groups'
      that it cannot be larger than 'max_connections'.
      This restriction may cause gpdb fail to start if the two GUCs
      are not set properly.
      We decide to decouple these two GUCs and set a hard limit
      of 100 for 'max_resource_groups'.
      cbd23ea2
  9. 26 9月, 2017 1 次提交
    • J
      Convert GPDB-specific GUCs to the new "enum" type · 59165cfa
      Jacob Champion 提交于
      Several GUCs are simply enumerated strings that are parsed into integer
      types behind the scenes. As of 8.4, the GUC system recognizes a new
      type, enum, which will do this for us. Move as many as we can to the new
      system.
      
      As part of this,
      - gp_idf_deduplicate was changed from a char* string to an int, and new
        IDF_DEDUPLICATE_* macros were added for each option
      - password_hash_algorithm was changed to an int
      - for codegen_optimization_level, "none" is the default now when codegen
        is not enabled during compilation (instead of the empty string).
      
      A couple of GUCs that *could* be represented as enums
      (optimizer_minidump, gp_workfile_compress_algorithm) have been
      purposefully kept with the prior system because they require the GUC
      variable to be something other than an integer anyway.
      Signed-off-by: NJacob Champion <pchampion@pivotal.io>
      59165cfa
  10. 22 9月, 2017 1 次提交
    • K
      Enable ORCA to be tracked by Mem Accounting · 669dd279
      Kavinder Dhaliwal 提交于
      Before this commit all memory allocations made by ORCA/GPOS were a
      blackbox to GPDB. However the ground work had been in place to allow
      GPDB's Memory Accounting Framework to track memory consumption by ORCA.
      This commit introduces two new functions
      Ext_OptimizerAlloc and Ext_OptimizerFree which
      pass through their parameters to gp_malloc and gp_free and do some bookeeping
      against the Optimizer Memory Account. This introduces very little
      overhead to the GPOS memory management framework.
      Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
      Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
      669dd279
  11. 21 9月, 2017 1 次提交
    • A
      Make gp_replication.conf for USE_SEGWALREP only. · b7ce6930
      Ashwin Agrawal 提交于
      The intend of this extra configuration file is to control the
      synchronization between primary and mirror for WALREP.
      
      The gp_replication.conf is not designed to work with filerep, for
      example, the scripts like gp_expand will fail since it directly modify
      the configuration files instead of going through initdb.
      Signed-off-by: NXin Zhang <xzhang@pivotal.io>
      b7ce6930
  12. 20 9月, 2017 1 次提交
  13. 19 9月, 2017 3 次提交
  14. 14 9月, 2017 1 次提交
    • H
      Remove unused ENABLE_LTRACE code. · d994b38e
      Heikki Linnakangas 提交于
      Although I'm not too familiar with SystemTap, I'm pretty sure that recent
      versions can do user space tracing better. I don't think anyone is using
      these hacks anymore, so remove them.
      d994b38e
  15. 07 9月, 2017 4 次提交
  16. 01 9月, 2017 1 次提交
  17. 29 8月, 2017 1 次提交
  18. 28 8月, 2017 1 次提交
  19. 24 8月, 2017 1 次提交
    • R
      Add GUC 'memory_spill_ratio' for resource group. · 44373949
      Richard Guo 提交于
      GUC 'memory_spill_ratio' can be set at multiple levels,
      in particular at resource group level and session level,
      and session would override resource group.
      
      When setting GUC 'memory_spill_ratio' at session level, the
      semantic validation will not be checked until it is referred
      by further queries.
      44373949
  20. 21 8月, 2017 1 次提交
    • D
      Move ORCA invocation into standard_planner · d5dbbfd9
      Daniel Gustafsson 提交于
      The way ORCA was tied into the planner, running a planner_hook
      was not supported in the intended way. This commit moves ORCA
      into standard_planner() instead of planner() and leaves the hook
      for extensions to make use of, with or without ORCA. Since the
      intention with the optimizer GUC is to replace the planner in
      postgres, while keeping the planning proess, this allows for
      planner extensions to co-operate with that.
      
      In order to reduce the Greenplum footprint in upstream postgres
      source files for future merges, the ORCA functions are moved to
      their own file.
      
      Also adds a memaccounting class for planner hooks since they
      otherwise ran in the planner scope, as well as a test for using
      planner_hooks.
      d5dbbfd9
  21. 15 8月, 2017 1 次提交
  22. 09 8月, 2017 1 次提交
    • P
      Add debug info for interconnect network timeout · 9a9cd48b
      Pengzhou Tang 提交于
      It was very difficult to verify if interconnect is stucked in resending
      phase or if there is udp resending latency within interconnect. To improve
      it, this commit record a debug message every Gp_interconnect_debug_retry_interval
      times when gp_log_interconnect is set to DEBUG.
      9a9cd48b
  23. 03 8月, 2017 1 次提交
  24. 02 8月, 2017 1 次提交
    • R
      Make memory spill in resource group take effect · 68babac4
      Richard Guo 提交于
      Resource group memory spill is similar to 'statement_mem' in
      resource queue, the difference is memory spill is calculated
      according to the memory quota of the resource group.
      
      The related GUCs, variables and functions shared by both resource
      queue and resource group are moved to the namespace resource manager.
      
      Also codes of resource queue relating to memory policy are refactored in this commit.
      Signed-off-by: NPengzhou Tang <ptang@pivotal.io>
      Signed-off-by: NNing Yu <nyu@pivotal.io>
      68babac4
  25. 13 7月, 2017 1 次提交
    • A
      Add GUC to control number of blocks that a resync worker operates on · 2960bd7c
      Asim R P 提交于
      The GUC gp_changetracking_max_rows replaces a compile time constant.  Resync
      worker obtains at the most gp_changetracking_max_rows number of changed blocks
      from changetracking log at one time.  Controling this with a GUC allows
      exploiting bugs in resync logic around this area.
      2960bd7c
  26. 29 6月, 2017 1 次提交
    • N
      Implement resgroup memory limit (#2669) · b5e1fb0a
      Ning Yu 提交于
      Implement resgroup memory limit.
      
      In a resgroup we divide the memory into several slots, the number
      depends on the concurrency setting in the resgroup. Each slot has a
      reserved quota of memory, all the slots also share some shared memory
      which can be acquired preemptively.
      
      Some GUCs and resgroup options are defined to adjust the exact allocation
      policy:
      
      resgroup options:
      - memory_shared_quota
      - memory_spill_ratio
      
      GUCs:
      - gp_resource_group_memory_limit
      Signed-off-by: NNing Yu <nyu@pivotal.io>
      b5e1fb0a
  27. 24 6月, 2017 1 次提交
    • A
      Enable xlogging for create fs objects on segments. · 9efec6b2
      Ashwin Agrawal 提交于
      Incase of --enable-segwalrep, write-ahead logging should not be skipped for
      anything, as it relies on that mechanism to construct the things on
      mirror. Write-ahead logging for these pieces were only enabled performed for
      master, with this commit gets enabled for segments as well.
      9efec6b2
  28. 22 6月, 2017 1 次提交
    • F
      Eliminating alien nodes before execution (#2588) · 9b8f5c0b
      foyzur 提交于
      In GPDB the dispatcher dispatches the entire plan tree to each query executor (QX). Each QX deserializes the entire plan tree and starts execution from the root of the plan tree. This begins by calling InitPlan on the QueryDesc, which blindly calls ExecInitNode on the root of the plan.
      
      Unfortunately, this is wasteful, in terms of memory and CPU. Each QX is in charge of a single slice. There can be many slices. Looking into plan nodes that belong to other slices, and initializing (e.g., creating PlanState for such nodes) is clearly wasteful. For large plans, particularly planner plans, in the presence of partitions, this can add up to a significant waste.
      
      This PR proposes a fix to solve this problem. The idea is to find the local root for each slice and start ExecInitNode there.
      
      There are few special cases:
      
      SubPlans are special, as they appear as expression but the expression holds the root of the sub plan tree. All the subplans are bundled in the plannedstmt->subplans, but confusingly as Plan pointers (i.e., we save the root of the SubPlan expression's Plan tree). Therefore, to find the relevant sub plans, we need to first find the relevant expressions and extract their roots and then iterate the plannedstmt->subplans, but only ExecInitNode on the ones that we can reach from some expressions in current slice.
      
      InitPlan are no better as they can appear anywhere in the Plan tree. Walking from a local motion is not sufficient to find these InitPlan. Therefore, we need to walk from the root of the plan tree and identify all the SubPlan. Note: unlike regular subplan, the initplan may not appear in the expression as subplan; rather it will appear as a parameter generator in some other parts of the tree. We need to find these InitPlan and obtain the SubPlan for each InitPlan. We can then use the SubPlan's setParam to copy precomputed parameter values from estate->es_param_list_info to estate->es_param_exec_vals
      
      We also found that the origSliceIdInPlan is highly unreliable and cannot be used as an indicator of a plan node's slice information. Therefore, we precompute each plan node's slice information to correctly determine if a Plan node is alien or not. This makes alien node identification more accurate. In successive PRs, we plan to use the alien memory account balance as a test to see if we successfully eliminated all aliens. We will also use the alien account balance to determine memory savings.
      9b8f5c0b
  29. 07 6月, 2017 2 次提交
    • P
      restore TCP interconnect · 353a937d
      Pengzhou Tang 提交于
      This commit restore TCP interconnect and fix some hang issues.
      
      * restore TCP interconnect code
      * Add GUC called gp_interconnect_tcp_listener_backlog for tcp to control the backlog param of listen call
      * use memmove instead of memcpy because the memory areas do overlap.
      * call checkForCancelFromQD() for TCP interconnect if there are no data for a while, this can avoid QD from getting stuck.
      * revert cancelUnfinished related modification in 8d251945, otherwise some queries will get stuck
      * move and rename faultinjector "cursor_qe_reader_after_snapshot" to make test cases pass under TCP interconnect.
      353a937d
    • P
      Misc changes of gp_log_gang · 9d5b10ae
      Pengzhou Tang 提交于
      * Change the default level of gp_log_gang to off.
      * Log the query plan size in level TERSE, it's useful for debugging.
      9d5b10ae
  30. 25 5月, 2017 1 次提交
  31. 19 5月, 2017 2 次提交
    • P
      Implement resource group cpu rate limitation. · 2650f728
      Pengzhou Tang 提交于
      Resource group cpu rate limitation is implemented with cgroup on linux
      system. When resource group is enabled via GUC we check whether cgroup
      is available and properly configured on the system. A sub cgroup is
      created for each resource group, cpu quota and share weight will be set
      depends on the resource group configuration. The queries will run under
      these cgroups, and the cpu usage will be restricted by cgroup.
      
      The cgroups directory structures:
      * /sys/fs/cgroup/{cpu,cpuacct}/gpdb: the toplevel gpdb cgroup
      * /sys/fs/cgroup/{cpu,cpuacct}/gpdb/*/: cgroup for each resource group
      
      The logic for cpu rate limitation:
      
      * in toplevel gpdb cgroup we set the cpu quota and share weight as:
      
          cpu.cfs_quota_us := cpu.cfs_period_us * 256 * gp_resource_group_cpu_limit
          cpu.shares := 1024 * ncores
      
      * for each sub group we set the cpu quota and share weight as:
      
          sub.cpu.cfs_quota_us := -1
          sub.cpu.shares := top.cpu.shares * sub.cpu_rate_limit
      
      The minimum and maximum cpu percentage for a sub cgroup:
      
          sub.cpu.min_percentage := gp_resource_group_cpu_limit * sub.cpu_rate_limit
          sub.cpu.max_percentage := gp_resource_group_cpu_limit
      
      The acutal percentage depends on how busy the system is.
      
      gp_resource_group_cpu_limit is a GUC introduced to control the cpu
      resgroups assigned on each host.
      
          gpconfig -c gp_resource_group_cpu_limit -v '0.9'
      
      A new pipeline is created to perform the tests as we need privileged
      permission to enable and setup cgroups on the system.
      Signed-off-by: NNing Yu <nyu@pivotal.io>
      2650f728
    • V
      Make ICG tests pass when GPDB is compiled with disable-orca · 7e774f28
      Venkatesh Raghavan 提交于
      In the updated tests, we used functions like disable_xform and
      enable_xform to hint the optimizer to disallow/allow a particular
      physical node. However, these functions are only available when GPDB
      is built with GPORCA. Planner on the other hand accomplished this
      via a GUC.
      
      To avoid usage of these functions in tests, I have introduced couple
      of GUCS that mimic the same planner behavior but now for GPORCA.
      In this effort I needed to add an API inside GPORCA.
      7e774f28
  32. 15 5月, 2017 1 次提交
    • V
      Streamline Orca Gucs · 9f2c838b
      Venkatesh Raghavan 提交于
      * Enable analyzing root partitions
      * Ensure that the name of the guc is clear
      * Remove double negation (where possible)
      * Update comments
      * Co-locate gucs that have similar purpose
      * Remove dead gucs
      * Classify them correctly so that they are no longer hidden
      9f2c838b
  33. 11 5月, 2017 1 次提交