1. 28 6月, 2017 1 次提交
  2. 22 6月, 2017 1 次提交
    • F
      Eliminating alien nodes before execution (#2588) · 9b8f5c0b
      foyzur 提交于
      In GPDB the dispatcher dispatches the entire plan tree to each query executor (QX). Each QX deserializes the entire plan tree and starts execution from the root of the plan tree. This begins by calling InitPlan on the QueryDesc, which blindly calls ExecInitNode on the root of the plan.
      
      Unfortunately, this is wasteful, in terms of memory and CPU. Each QX is in charge of a single slice. There can be many slices. Looking into plan nodes that belong to other slices, and initializing (e.g., creating PlanState for such nodes) is clearly wasteful. For large plans, particularly planner plans, in the presence of partitions, this can add up to a significant waste.
      
      This PR proposes a fix to solve this problem. The idea is to find the local root for each slice and start ExecInitNode there.
      
      There are few special cases:
      
      SubPlans are special, as they appear as expression but the expression holds the root of the sub plan tree. All the subplans are bundled in the plannedstmt->subplans, but confusingly as Plan pointers (i.e., we save the root of the SubPlan expression's Plan tree). Therefore, to find the relevant sub plans, we need to first find the relevant expressions and extract their roots and then iterate the plannedstmt->subplans, but only ExecInitNode on the ones that we can reach from some expressions in current slice.
      
      InitPlan are no better as they can appear anywhere in the Plan tree. Walking from a local motion is not sufficient to find these InitPlan. Therefore, we need to walk from the root of the plan tree and identify all the SubPlan. Note: unlike regular subplan, the initplan may not appear in the expression as subplan; rather it will appear as a parameter generator in some other parts of the tree. We need to find these InitPlan and obtain the SubPlan for each InitPlan. We can then use the SubPlan's setParam to copy precomputed parameter values from estate->es_param_list_info to estate->es_param_exec_vals
      
      We also found that the origSliceIdInPlan is highly unreliable and cannot be used as an indicator of a plan node's slice information. Therefore, we precompute each plan node's slice information to correctly determine if a Plan node is alien or not. This makes alien node identification more accurate. In successive PRs, we plan to use the alien memory account balance as a test to see if we successfully eliminated all aliens. We will also use the alien account balance to determine memory savings.
      9b8f5c0b
  3. 07 6月, 2017 1 次提交
    • P
      Fix an unexpectd cursor error under TCP interconnect · 09aca5fa
      Pengzhou Tang 提交于
      Under former TCP interconnect, after declaring a cursor for an invalid
      query like "declare c1 cursor for select c1/0 from foo", the following FETCH
      command can still fetch an empty row instead of an error. This is
      incorrect and does not consist with UDP interconnect.
      
      The RCA is senders of TCP interconnect always send an EOF message to
      their peers regardless of the errors on segments, so the receivers can
      not tell the difference of EOF or an error.
      
      The solution in this commit is do not send the EOF to the peers if senders is
      encountering an error and let the QD to check the whole segments status when it
      can not read data from interconnect for a long time.
      09aca5fa
  4. 13 2月, 2017 1 次提交
  5. 13 1月, 2017 1 次提交
    • H
      Always allocate DynamicTableScanInfo in es_query_cxt. · 3a10ddd5
      Heikki Linnakangas 提交于
      DynamicTableScanInfo is an extension of EState, so always allocate it in the
      same memory context. The DynamicTableScanInfo.memoryContext field always
      pointed to es_query_cxt, so that is what we in fact always did anyway, this
      just removes the unnecessary abstraction, for simplicity.
      3a10ddd5
  6. 04 11月, 2016 1 次提交
  7. 18 8月, 2016 1 次提交
  8. 03 8月, 2016 1 次提交
  9. 25 7月, 2016 1 次提交
    • P
      Refactor utility statement dispatch interfaces · 01769ada
      Pengzhou Tang 提交于
      refactor CdbDispatchUtilityStatement() to make it flexible for cdbCopyStart(),
      dispatchVacuum() to call directly. Introduce flags like DF_NEED_TWO_SNAPSHOT,
      DF_WITH_SNAPSHOT, DF_CANCEL_ON_ERROR to make function call much clearer
      01769ada
  10. 28 6月, 2016 1 次提交
  11. 21 6月, 2016 1 次提交
  12. 17 6月, 2016 1 次提交
  13. 13 6月, 2016 1 次提交
    • K
      Dispatch exactly same text string for all slices. · 4b360942
      Kenan Yao 提交于
      Include a map from sliceIndex to gang_id in the dispatched string,
      and remove the localSlice field, hence QE should get the localSlice
      from the map now. By this way, we avoid duplicating and modifying
      the dispatch text string slice by slice, and each QE of a sliced
      dispatch would get same contents now.
      
      The extra space cost is sizeof(int) * SliceNumber bytes, and the extra
      computing cost is iterating the SliceNumber-size array. Compared with
      memcpy of text string for each slice in previous implementation, this
      way is much cheaper, because SliceNumber is much smaller than the size
      of dispatch text string. Also, since SliceNumber is so small, we just
      use an array for the map instead of a hash table.
      
      Also, clean up some dead code in dispatcher, including:
      (1) Remove primary_gang_id field of Slice struct and DispatchCommandDtxProtocolParms
      struct, since dispatch agent is deprecated now;
      (2) Remove redundant logic in cdbdisp_dispatchX;
      (3) Clean up buildGpDtxProtocolCommand;
      4b360942
  14. 09 6月, 2016 1 次提交
    • H
      Simplify counting of tupletable slots, by getting rid of the counting. · 10ca88b4
      Heikki Linnakangas 提交于
      Backport this patch from PostgreSQL 9.0, which replaces the tuple table
      array with a linked list of individually palloc'd slots. With that, we
      don't need to know the size of the array beforehand, and don't need to
      count the slots. The counting was especially funky for subplans in GPDB,
      and it was about to change with the upcoming PostgreSQL 8.3 merge again.
      This makes it a lot simpler.
      
      I don't plan to backport the follow-up patch to remove the ExecCountSlots
      infrastructure. We'll get that later, when we merge with PostgreSQL 9.0.
      
      commit f92e8a4b
      Author: Tom Lane <tgl@sss.pgh.pa.us>
      Date:   Sun Sep 27 20:09:58 2009 +0000
      
          Replace the array-style TupleTable data structure with a simple List of
          TupleTableSlot nodes.  This eliminates the need to count in advance
          how many Slots will be needed, which seems more than worth the small
          increase in the amount of palloc traffic during executor startup.
      
          The ExecCountSlots infrastructure is now all dead code, but I'll remove it
          in a separate commit for clarity.
      
          Per a comment from Robert Haas.
      10ca88b4
  15. 21 5月, 2016 1 次提交
    • G
      refactor gang management code · 46dfa750
      Gang Xiong 提交于
      1) add one new type of gang: singleton reader gang.
      2) change interface of allocateGang.
      3) handling exceptions during gang creation: segment down and segment reset.
      4) cleanup some dead code.
      46dfa750
  16. 19 5月, 2016 1 次提交
  17. 11 5月, 2016 1 次提交
    • S
      This commit generates code for code path: ExecVariableList > slot_getattr >... · 7b75d9ea
      Shreedhar Hardikar 提交于
      This commit  generates code for code path: ExecVariableList > slot_getattr > _slot_getsomeattrs > slot_deform_tuple. This code path is executed during scan in simple select queries that do not have a where clause (e.g., select bar from foo;).
      
      For each attribute A in the target list, regular implementation of ExecvariableList retrieves the slot that A comes from and calls slot_getattr. slot_get_attr() will eventually call slot_deform_tuple (through _slot_getsomeattrs), which fetches all yet unread attributes of the slot until the given attribute.
      
      This commit generates the code for the case that all the attributes in target list use the same slot (created during a scan i.e, ecxt_scantuple). Moreover, instead of looping over the target list one at a time, it uses slot_getattr only once, with the largest attribute index from the target list.
      
      If during code generation time, the completion is not possible (e.g., attributes use different slots), then function returns false and codegen manager will be responsible to manage the clean up.
      
      This implementation does not support:
      * Null attributes
      * Variable length attributes
      * Fixed length attributes passed by reference (e.g., uuid)
      If at execution time, we see any of the above types of attributes, we fall back to the regular function.
      
      Moreover, this commit:
      * renames existing "codegen" guc, which is used for initiating llvm libraries, to "init_codegen" (Note: we set this guc in gpconfig),
      * adds guc "codegen", which enables code generation and compilation at query execution time,
      * enhances the existing code generation utilities by adding a function that creates all IR instructions for falling back to regular functions, and
      * removes the existing code that generates the code of a naive slot_deform_tuple, which simply falls back to regular slot_deform_tuple.
      Signed-off-by: NNikos Armenatzoglou <nikos.armenatzoglou@gmail.com>
      7b75d9ea
  18. 28 3月, 2016 1 次提交
  19. 26 2月, 2016 1 次提交
    • H
      Remove dead sendInitGpmonPkts() function. · c13ca731
      Heikki Linnakangas 提交于
      The jump table was missing some entries, and hence the code wouldn't have
      worked correctly. Fortunately it was all dead code, as sendInitGpmonPkts()
      only called itself, there were no other callers.
      
      This fixes issue #256. Thanks to Craig Harris for the report!
      c13ca731
  20. 12 1月, 2016 1 次提交
    • H
      Make functions in gp_toolkit to execute on all nodes as intended. · 246f7510
      Heikki Linnakangas 提交于
      Moving the installation of gp_toolkit.sql into initdb, in commit f8910c3c,
      broke all the functions that are supposed to execute on all nodes, like
      gp_toolkit.__gp_localid. After that change, gp_toolkit.sql was executed
      in utility mode, and the gp_distribution_policy entries for those functions
      were not created as a result.
      
      To fix, change the code so that gp_distribution_policy entries are never
      never created, or consulted, for EXECUTE-type external tables. They have
      more fine-grained information in pg_exttable.location field anyway, so rely
      on that instead. With this change, there is no difference in whether an
      EXECUTE-type external table is created in utility mode or not. We would
      still have similar problems if gp_toolkit contained other kinds of
      external tables, but it doesn't.
      
      This removes the isMasterOnly() function and changes all its callers to
      call GpPolicyFetch() directly instead. Some places used GpPolicyFetch()
      directly to check if a table is distributed, so this just makes that the
      canonical way to do it. The check for system schemas that used to be in
      isMasterOnly() are no longer performed, but they should've unnecessary in
      the first place. System tables don't have gp_distribution_policy entries,
      so they'll be treated as master-only even without that check.
      246f7510
  21. 28 10月, 2015 1 次提交
  22. 30 11月, 2012 1 次提交
    • T
      Fix assorted bugs in CREATE INDEX CONCURRENTLY. · 5c8c7c7c
      Tom Lane 提交于
      This patch changes CREATE INDEX CONCURRENTLY so that the pg_index
      flag changes it makes without exclusive lock on the index are made via
      heap_inplace_update() rather than a normal transactional update.  The
      latter is not very safe because moving the pg_index tuple could result in
      concurrent SnapshotNow scans finding it twice or not at all, thus possibly
      resulting in index corruption.
      
      In addition, fix various places in the code that ought to check to make
      sure that the indexes they are manipulating are valid and/or ready as
      appropriate.  These represent bugs that have existed since 8.2, since
      a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
      index behind, and we ought not try to do anything that might fail with
      such an index.
      
      Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
      columns that are allowed to change after initial creation.  Previously we
      could have been left with stale values of some fields in an index relcache
      entry.  It's not clear whether this actually had any user-visible
      consequences, but it's at least a bug waiting to happen.
      
      This is a subset of a patch already applied in 9.2 and HEAD.  Back-patch
      into all earlier supported branches.
      
      Tom Lane and Andres Freund
      5c8c7c7c
  23. 21 7月, 2012 1 次提交
    • T
      Fix whole-row Var evaluation to cope with resjunk columns (again). · 038c36b6
      Tom Lane 提交于
      When a whole-row Var is reading the result of a subquery, we need it to
      ignore any "resjunk" columns that the subquery might have evaluated for
      GROUP BY or ORDER BY purposes.  We've hacked this area before, in commit
      68e40998, but that fix only covered
      whole-row Vars of named composite types, not those of RECORD type; and it
      was mighty klugy anyway, since it just assumed without checking that any
      extra columns in the result must be resjunk.  A proper fix requires getting
      hold of the subquery's targetlist so we can actually see which columns are
      resjunk (whereupon we can use a JunkFilter to get rid of them).  So bite
      the bullet and add some infrastructure to make that possible.
      
      Per report from Andrew Dunstan and additional testing by Merlin Moncure.
      Back-patch to all supported branches.  In 8.3, also back-patch commit
      292176a1, which for some reason I had
      not done at the time, but it's a prerequisite for this change.
      038c36b6
  24. 02 1月, 2008 1 次提交
  25. 01 12月, 2007 1 次提交
    • T
      Avoid incrementing the CommandCounter when CommandCounterIncrement is called · 895a94de
      Tom Lane 提交于
      but no database changes have been made since the last CommandCounterIncrement.
      This should result in a significant improvement in the number of "commands"
      that can typically be performed within a transaction before hitting the 2^32
      CommandId size limit.  In particular this buys back (and more) the possible
      adverse consequences of my previous patch to fix plan caching behavior.
      
      The implementation requires tracking whether the current CommandCounter
      value has been "used" to mark any tuples.  CommandCounter values stored into
      snapshots are presumed not to be used for this purpose.  This requires some
      small executor changes, since the executor used to conflate the curcid of
      the snapshot it was using with the command ID to mark output tuples with.
      Separating these concepts allows some small simplifications in executor APIs.
      
      Something for the TODO list: look into having CommandCounterIncrement not do
      AcceptInvalidationMessages.  It seems fairly bogus to be doing it there,
      but exactly where to do it instead isn't clear, and I'm disinclined to mess
      with asynchronous behavior during late beta.
      895a94de
  26. 16 11月, 2007 1 次提交
  27. 21 9月, 2007 1 次提交
    • T
      HOT updates. When we update a tuple without changing any of its indexed · 282d2a03
      Tom Lane 提交于
      columns, and the new version can be stored on the same heap page, we no longer
      generate extra index entries for the new version.  Instead, index searches
      follow the HOT-chain links to ensure they find the correct tuple version.
      
      In addition, this patch introduces the ability to "prune" dead tuples on a
      per-page basis, without having to do a complete VACUUM pass to recover space.
      VACUUM is still needed to clean up dead index entries, however.
      
      Pavan Deolasee, with help from a bunch of other people.
      282d2a03
  28. 16 8月, 2007 1 次提交
    • T
      Arrange to cache a ResultRelInfo in the executor's EState for relations that · 817946bb
      Tom Lane 提交于
      are not one of the query's defined result relations, but nonetheless have
      triggers fired against them while the query is active.  This was formerly
      impossible but can now occur because of my recent patch to fix the firing
      order for RI triggers.  Caching a ResultRelInfo avoids duplicating work by
      repeatedly opening and closing the same relation, and also allows EXPLAIN
      ANALYZE to "see" and report on these extra triggers.  Use the same mechanism
      to cache open relations when firing deferred triggers at transaction shutdown;
      this replaces the former one-element-cache strategy used in that case, and
      should improve performance a bit when there are deferred triggers on a number
      of relations.
      817946bb
  29. 01 8月, 2007 1 次提交
  30. 28 7月, 2007 1 次提交
  31. 27 2月, 2007 1 次提交
    • T
      Get rid of the separate EState for subplans, and just let them share the · c7ff7663
      Tom Lane 提交于
      parent query's EState.  Now that there's a single flat rangetable for both
      the main plan and subplans, there's no need anymore for a separate EState,
      and removing it allows cleaning up some crufty code in nodeSubplan.c and
      nodeSubqueryscan.c.  Should be a tad faster too, although any difference
      will probably be hard to measure.  This is the last bit of subsidiary
      mop-up work from changing to a flat rangetable.
      c7ff7663
  32. 23 2月, 2007 1 次提交
    • T
      Turn the rangetable used by the executor into a flat list, and avoid storing · eab6b8b2
      Tom Lane 提交于
      useless substructure for its RangeTblEntry nodes.  (I chose to keep using the
      same struct node type and just zero out the link fields for unneeded info,
      rather than making a separate ExecRangeTblEntry type --- it seemed too
      fragile to have two different rangetable representations.)
      
      Along the way, put subplans into a list in the toplevel PlannedStmt node,
      and have SubPlan nodes refer to them by list index instead of direct pointers.
      Vadim wanted to do that years ago, but I never understood what he was on about
      until now.  It makes things a *whole* lot more robust, because we can stop
      worrying about duplicate processing of subplans during expression tree
      traversals.  That's been a constant source of bugs, and it's finally gone.
      
      There are some consequent simplifications yet to be made, like not using
      a separate EState for subplans in the executor, but I'll tackle that later.
      eab6b8b2
  33. 21 2月, 2007 1 次提交
    • T
      Remove the Query structure from the executor's API. This allows us to stop · 9cbd0c15
      Tom Lane 提交于
      storing mostly-redundant Query trees in prepared statements, portals, etc.
      To replace Query, a new node type called PlannedStmt is inserted by the
      planner at the top of a completed plan tree; this carries just the fields of
      Query that are still needed at runtime.  The statement lists kept in portals
      etc. now consist of intermixed PlannedStmt and bare utility-statement nodes
      --- no Query.  This incidentally allows us to remove some fields from Query
      and Plan nodes that shouldn't have been there in the first place.
      
      Still to do: simplify the execution-time range table; at the moment the
      range table passed to the executor still contains Query trees for subqueries.
      
      initdb forced due to change of stored rules.
      9cbd0c15
  34. 07 2月, 2007 1 次提交
    • T
      Remove typmod checking from the recent security-related patches. It turns · a8c3f161
      Tom Lane 提交于
      out that ExecEvalVar and friends don't necessarily have access to a tuple
      descriptor with correct typmod: it definitely can contain -1, and possibly
      might contain other values that are different from the Var's value.
      Arguably this should be cleaned up someday, but it's not a simple change,
      and in any case typmod discrepancies don't pose a security hazard.
      Per reports from numerous people :-(
      
      I'm not entirely sure whether the failure can occur in 8.0 --- the simple
      test cases reported so far don't trigger it there.  But back-patch the
      change all the way anyway.
      a8c3f161
  35. 02 2月, 2007 1 次提交
    • T
      Repair failure to check that a table is still compatible with a previously · 5413eef8
      Tom Lane 提交于
      made query plan.  Use of ALTER COLUMN TYPE creates a hazard for cached
      query plans: they could contain Vars that claim a column has a different
      type than it now has.  Fix this by checking during plan startup that Vars
      at relation scan level match the current relation tuple descriptor.  Since
      at that point we already have at least AccessShareLock, we can be sure the
      column type will not change underneath us later in the query.  However,
      since a backend's locks do not conflict against itself, there is still a
      hole for an attacker to exploit: he could try to execute ALTER COLUMN TYPE
      while a query is in progress in the current backend.  Seal that hole by
      rejecting ALTER TABLE whenever the target relation is already open in
      the current backend.
      
      This is a significant security hole: not only can one trivially crash the
      backend, but with appropriate misuse of pass-by-reference datatypes it is
      possible to read out arbitrary locations in the server process's memory,
      which could allow retrieving database content the user should not be able
      to see.  Our thanks to Jeff Trout for the initial report.
      
      Security: CVE-2007-0556
      5413eef8
  36. 06 1月, 2007 1 次提交
  37. 27 12月, 2006 1 次提交
    • T
      Fix failure due to accessing an already-freed tuple descriptor in a plan · 0cbc5b1e
      Tom Lane 提交于
      involving HashAggregate over SubqueryScan (this is the known case, there
      may well be more).  The bug is only latent in releases before 8.2 since they
      didn't try to access tupletable slots' descriptors during ExecDropTupleTable.
      The least bogus fix seems to be to make subqueries share the parent query's
      memory context, so that tupdescs they create will have the same lifespan as
      those of the parent query.  There are comments in the code envisioning going
      even further by not having a separate child EState at all, but that will
      require rethinking executor access to range tables, which I don't want to
      tackle right now.  Per bug report from Jean-Pierre Pelletier.
      0cbc5b1e
  38. 04 10月, 2006 1 次提交
  39. 05 8月, 2006 1 次提交
    • T
      Fix domain_in() bug exhibited by Darcy Buskermolen. The idea of an EState · c6848986
      Tom Lane 提交于
      that's shorter-lived than the expression state being evaluated in it really
      doesn't work :-( --- we end up with fn_extra caches getting deleted while
      still in use.  Rather than abandon the notion of caching expression state
      across domain_in calls altogether, I chose to make domain_in a bit cozier
      with ExprContext.  All we really need for evaluating variable-free
      expressions is an ExprContext, not an EState, so I invented the notion of a
      "standalone" ExprContext.  domain_in can prevent resource leakages by doing
      a ReScanExprContext on this rather than having to free it entirely; so we
      can make the ExprContext have the same lifespan (and particularly the same
      per_query memory context) as the expression state structs.
      c6848986
  40. 01 8月, 2006 1 次提交
    • T
      Change the relation_open protocol so that we obtain lock on a relation · 09d3670d
      Tom Lane 提交于
      (table or index) before trying to open its relcache entry.  This fixes
      race conditions in which someone else commits a change to the relation's
      catalog entries while we are in process of doing relcache load.  Problems
      of that ilk have been reported sporadically for years, but it was not
      really practical to fix until recently --- for instance, the recent
      addition of WAL-log support for in-place updates helped.
      
      Along the way, remove pg_am.amconcurrent: all AMs are now expected to support
      concurrent update.
      09d3670d