1. 16 3月, 2016 6 次提交
  2. 15 3月, 2016 3 次提交
    • T
      Allow callers of create_foreignscan_path to specify nondefault PathTarget. · 28048cba
      Tom Lane 提交于
      Although the default choice of rel->reltarget should typically be
      sufficient for scan or join paths, it's not at all sufficient for the
      purposes PathTargets were invented for; in particular not for
      upper-relation Paths.  So break API compatibility by adding a PathTarget
      argument to create_foreignscan_path().  To ease updating of existing
      code, accept a NULL value of the argument as selecting rel->reltarget.
      28048cba
    • T
      Rethink representation of PathTargets. · 307c7885
      Tom Lane 提交于
      In commit 19a54114 I did not make PathTarget a subtype of Node,
      and embedded a RelOptInfo's reltarget directly into it rather than having
      a separately-allocated Node.  In hindsight that was misguided
      micro-optimization, enabled by the fact that at that point we didn't have
      any Paths with custom PathTargets.  Now that PathTarget processing has
      been fleshed out some more, it's easier to see that it's better to have
      PathTarget as an indepedent Node type, even if it does cost us one more
      palloc to create a RelOptInfo.  So change it while we still can.
      
      This commit just changes the representation, without doing anything more
      interesting than that.
      307c7885
    • R
      Update more comments for 96198d94. · 6be84eeb
      Robert Haas 提交于
      Etsuro Fujita, reviewed (though not completely endorsed) by Ashutosh
      Bapat, and slightly expanded by me.
      6be84eeb
  3. 13 3月, 2016 2 次提交
    • M
      Rename auto_explain.sample_ratio to sample_rate · 7a8d8748
      Magnus Hagander 提交于
      Per suggestion from Tomas Vondra
      
      Author: Julien Rouhaud
      7a8d8748
    • T
      Widen query numbers-of-tuples-processed counters to uint64. · 23a27b03
      Tom Lane 提交于
      This patch widens SPI_processed, EState's es_processed field, PortalData's
      portalPos field, FuncCallContext's call_cntr and max_calls fields,
      ExecutorRun's count argument, PortalRunFetch's result, and the max number
      of rows in a SPITupleTable to uint64, and deals with (I hope) all the
      ensuing fallout.  Some of these values were declared uint32 before, and
      others "long".
      
      I also removed PortalData's posOverflow field, since that logic seems
      pretty useless given that portalPos is now always 64 bits.
      
      The user-visible results are that command tags for SELECT etc will
      correctly report tuple counts larger than 4G, as will plpgsql's GET
      GET DIAGNOSTICS ... ROW_COUNT command.  Queries processing more tuples
      than that are still not exactly the norm, but they're becoming more
      common.
      
      Most values associated with FETCH/MOVE distances, such as PortalRun's count
      argument and the count argument of most SPI functions that have one, remain
      declared as "long".  It's not clear whether it would be worth promoting
      those to int64; but it would definitely be a large dollop of additional
      API churn on top of this, and it would only help 32-bit platforms which
      seem relatively less likely to see any benefit.
      
      Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
      23a27b03
  4. 11 3月, 2016 2 次提交
    • M
      Allow setting sample ratio for auto_explain · 92f03fe7
      Magnus Hagander 提交于
      New configuration parameter auto_explain.sample_ratio makes it
      possible to log just a fraction of the queries meeting the configured
      threshold, to reduce the amount of logging.
      
      Author: Craig Ringer and Julien Rouhaud
      Review: Petr Jelinek
      92f03fe7
    • T
      Refactor pull_var_clause's API to make it less tedious to extend. · 364a9f47
      Tom Lane 提交于
      In commit 1d97c19a and later c1d9579d, we extended
      pull_var_clause's API by adding enum-type arguments.  That's sort of a pain
      to maintain, though, because it means every time we add a new behavior we
      must touch every last one of the call sites, even if there's a reasonable
      default behavior that most of them could use.  Let's switch over to using a
      bitmask of flags, instead; that seems more maintainable and might save a
      nanosecond or two as well.  This commit changes no behavior in itself,
      though I'm going to follow it up with one that does add a new behavior.
      
      In passing, remove flatten_tlist(), which has not been used since 9.1
      and would otherwise need the same API changes.
      
      Removing these enums means that optimizer/tlist.h no longer needs to
      depend on optimizer/var.h.  Changing that caused a number of C files to
      need addition of #include "optimizer/var.h" (probably we can thank old
      runs of pgrminclude for that); but on balance it seems like a good change
      anyway.
      364a9f47
  5. 10 3月, 2016 2 次提交
    • A
      Avoid unlikely data-loss scenarios due to rename() without fsync. · 1d4a0ab1
      Andres Freund 提交于
      Renaming a file using rename(2) is not guaranteed to be durable in face
      of crashes. Use the previously added durable_rename()/durable_link_or_rename()
      in various places where we previously just renamed files.
      
      Most of the changed call sites are arguably not critical, but it seems
      better to err on the side of too much durability.  The most prominent
      known case where the previously missing fsyncs could cause data loss is
      crashes at the end of a checkpoint. After the actual checkpoint has been
      performed, old WAL files are recycled. When they're filled, their
      contents are fdatasynced, but we did not fsync the containing
      directory. An OS/hardware crash in an unfortunate moment could then end
      up leaving that file with its old name, but new content; WAL replay
      would thus not replay it.
      
      Reported-By: Tomas Vondra
      Author: Michael Paquier, Tomas Vondra, Andres Freund
      Discussion: 56583BDD.9060302@2ndquadrant.com
      Backpatch: All supported branches
      1d4a0ab1
    • A
      pgcrypto: support changing S2K iteration count · 188f359d
      Alvaro Herrera 提交于
      pgcrypto already supports key-stretching during symmetric encryption,
      including the salted-and-iterated method; but the number of iterations
      was not configurable.  This commit implements a new s2k-count parameter
      to pgp_sym_encrypt() which permits selecting a larger number of
      iterations.
      
      Author: Jeff Janes
      188f359d
  6. 09 3月, 2016 3 次提交
  7. 08 3月, 2016 1 次提交
    • R
      Add pg_visibility contrib module. · ba0a198f
      Robert Haas 提交于
      This lets you examine the visibility map as well as page-level
      visibility information.  I initially wrote it as a debugging aid,
      but was encouraged to polish it for commit.
      
      Patch by me, reviewed by Masahiko Sawada.
      
      Discussion: 56D77803.6080503@BlueTreble.com
      ba0a198f
  8. 06 3月, 2016 3 次提交
    • A
      logical decoding: Fix handling of large old tuples with replica identity full. · c8f621c4
      Andres Freund 提交于
      When decoding the old version of an UPDATE or DELETE change, and if that
      tuple was bigger than MaxHeapTupleSize, we either Assert'ed out, or
      failed in more subtle ways in non-assert builds.  Normally individual
      tuples aren't bigger than MaxHeapTupleSize, with big datums toasted.
      But that's not the case for the old version of a tuple for logical
      decoding; the replica identity is logged as one piece. With the default
      replica identity btree limits that to small tuples, but that's not the
      case for FULL.
      
      Change the tuple buffer infrastructure to separate allocate over-large
      tuples, instead of always going through the slab cache.
      
      This unfortunately requires changing the ReorderBufferTupleBuf
      definition, we need to store the allocated size someplace. To avoid
      requiring output plugins to recompile, don't store HeapTupleHeaderData
      directly after HeapTupleData, but point to it via t_data; that leaves
      rooms for the allocated size.  As there's no reason for an output plugin
      to look at ReorderBufferTupleBuf->t_data.header, remove the field. It
      was just a minor convenience having it directly accessible.
      
      Reported-By: Adam Dratwiński
      Discussion: CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com
      c8f621c4
    • A
      logical decoding: old/newtuple in spooled UPDATE changes was switched around. · 0bda14d5
      Andres Freund 提交于
      Somehow I managed to flip the order of restoring old & new tuples when
      de-spooling a change in a large transaction from disk. This happens to
      only take effect when a change is spooled to disk which has old/new
      versions of the tuple. That only is the case for UPDATEs where he
      primary key changed or where replica identity is changed to FULL.
      
      The tests didn't catch this because either spooled updates, or updates
      that changed primary keys, were tested; not both at the same time.
      
      Found while adding tests for the following commit.
      
      Backpatch: 9.4, where logical decoding was added
      0bda14d5
    • A
      logical decoding: Tell reorderbuffer about all xids. · d9e903f3
      Andres Freund 提交于
      Logical decoding's reorderbuffer keeps transactions in an LSN ordered
      list for efficiency. To make that's efficiently possible upper-level
      xids are forced to be logged before nested subtransaction xids.  That
      only works though if these records are all looked at: Unfortunately we
      didn't do so for e.g. row level locks, which are otherwise uninteresting
      for logical decoding.
      
      This could lead to errors like:
      "ERROR: subxact logged without previous toplevel record".
      
      It's not sufficient to just look at row locking records, the xid could
      appear first due to a lot of other types of records (which will trigger
      the transaction to be marked logged with MarkCurrentTransactionIdLoggedIfAny).
      So invent infrastructure to tell reorderbuffer about xids seen, when
      they'd otherwise not pass through reorderbuffer.c.
      
      Reported-By: Jarred Ward
      Bug: #13844
      Discussion: 20160105033249.1087.66040@wrigleys.postgresql.org
      Backpatch: 9.4, where logical decoding was added
      d9e903f3
  9. 05 3月, 2016 1 次提交
    • R
      postgres_fdw: When sending ORDER BY, always include NULLS FIRST/LAST. · 3bea3f88
      Robert Haas 提交于
      Previously, we included NULLS FIRST when appropriate but relied on the
      default behavior to be NULLS LAST.  This is, however, not true for a
      sort in descending order and seems like a fragile assumption anyway.
      
      Report by Rajkumar Raghuwanshi.  Patch by Ashutosh Bapat.  Review
      comments from Michael Paquier and Tom Lane.
      3bea3f88
  10. 04 3月, 2016 1 次提交
  11. 03 3月, 2016 1 次提交
    • A
      logical decoding: fix decoding of a commit's commit time. · 7c17aac6
      Andres Freund 提交于
      When adding replication origins in 5aa23504, I somehow managed to set
      the timestamp of decoded transactions to InvalidXLogRecptr when decoding
      one made without a replication origin. Fix that, and the wrong type of
      the new commit_time variable.
      
      This didn't trigger a regression test failure because we explicitly
      don't show commit timestamps in the regression tests, as they obviously
      are variable. Add a test that checks that a decoded commit's timestamp
      is within minutes of NOW() from before the commit.
      
      Reported-By: Weiping Qu
      Diagnosed-By: Artur Zakirov
      Discussion: 56D4197E.9050706@informatik.uni-kl.de,
          56D42918.1010108@postgrespro.ru
      Backpatch: 9.5, where 5aa23504 originates.
      7c17aac6
  12. 02 3月, 2016 1 次提交
    • R
      Change the format of the VM fork to add a second bit per page. · a892234f
      Robert Haas 提交于
      The new bit indicates whether every tuple on the page is already frozen.
      It is cleared only when the all-visible bit is cleared, and it can be
      set only when we vacuum a page and find that every tuple on that page is
      both visible to every transaction and in no need of any future
      vacuuming.
      
      A future commit will use this new bit to optimize away full-table scans
      that would otherwise be triggered by XID wraparound considerations.  A
      page which is merely all-visible must still be scanned in that case, but
      a page which is all-frozen need not be.  This commit does not attempt
      that optimization, although that optimization is the goal here.  It
      seems better to get the basic infrastructure in place first.
      
      Per discussion, it's very desirable for pg_upgrade to automatically
      migrate existing VM forks from the old format to the new format.  That,
      too, will be handled in a follow-on patch.
      
      Masahiko Sawada, reviewed by Kyotaro Horiguchi, Fujii Masao, Amit
      Kapila, Simon Riggs, Andres Freund, and others, and substantially
      revised by me.
      a892234f
  13. 28 2月, 2016 1 次提交
  14. 26 2月, 2016 2 次提交
  15. 21 2月, 2016 1 次提交
    • R
      postgres_fdw: Avoid sharing list substructure. · dd077ef8
      Robert Haas 提交于
      list_concat(list_concat(a, b), c) destructively changes both a and b;
      to avoid such perils, copy lists of remote_conds before incorporating
      them into larger lists via list_concat().
      
      Ashutosh Bapat, per a report from Etsuro Fujita
      dd077ef8
  16. 19 2月, 2016 2 次提交
    • T
      Add an explicit representation of the output targetlist to Paths. · 19a54114
      Tom Lane 提交于
      Up to now, there's been an assumption that all Paths for a given relation
      compute the same output column set (targetlist).  However, there are good
      reasons to remove that assumption.  For example, an indexscan on an
      expression index might be able to return the value of an expensive function
      "for free".  While we have the ability to generate such a plan today in
      simple cases, we don't have a way to model that it's cheaper than a plan
      that computes the function from scratch, nor a way to create such a plan
      in join cases (where the function computation would normally happen at
      the topmost join node).  Also, we need this so that we can have Paths
      representing post-scan/join steps, where the targetlist may well change
      from one step to the next.  Therefore, invent a "struct PathTarget"
      representing the columns we expect a plan step to emit.  It's convenient
      to include the output tuple width and tlist evaluation cost in this struct,
      and there will likely be additional fields in future.
      
      While Path nodes that actually do have custom outputs will need their own
      PathTargets, it will still be true that most Paths for a given relation
      will compute the same tlist.  To reduce the overhead added by this patch,
      keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
      that column set to just point to their parent RelOptInfo's reltarget.
      (In the patch as committed, actually every Path is like that, since we
      do not yet have any cases of custom PathTargets.)
      
      I took this opportunity to provide some more-honest costing of
      PlaceHolderVar evaluation.  Up to now, the assumption that "scan/join
      reltargetlists have cost zero" was applied not only to Vars, where it's
      reasonable, but also PlaceHolderVars where it isn't.  Now, we add the eval
      cost of a PlaceHolderVar's expression to the first plan level where it can
      be computed, by including it in the PathTarget cost field and adding that
      to the cost estimates for Paths.  This isn't perfect yet but it's much
      better than before, and there is a way forward to improve it more.  This
      costing change affects the join order chosen for a couple of the regression
      tests, changing expected row ordering.
      19a54114
    • T
      Fix multiple bugs in contrib/pgstattuple's pgstatindex() function. · 48e6c943
      Tom Lane 提交于
      Dead or half-dead index leaf pages were incorrectly reported as live, as a
      consequence of a code rearrangement I made (during a moment of severe brain
      fade, evidently) in commit d287818e.
      
      The index metapage was not counted in index_size, causing that result to
      not agree with the actual index size on-disk.
      
      Index root pages were not counted in internal_pages, which is inconsistent
      compared to the case of a root that's also a leaf (one-page index), where
      the root would be counted in leaf_pages.  Aside from that inconsistency,
      this could lead to additional transient discrepancies between the reported
      page counts and index_size, since it's possible for pgstatindex's scan to
      see zero or multiple pages marked as BTP_ROOT, if the root moves due to
      a split during the scan.  With these fixes, index_size will always be
      exactly one page more than the sum of the displayed page counts.
      
      Also, the index_size result was incorrectly documented as being measured in
      pages; it's always been measured in bytes.  (While fixing that, I couldn't
      resist doing some small additional wordsmithing on the pgstattuple docs.)
      
      Including the metapage causes the reported index_size to not be zero for
      an empty index.  To preserve the desired property that the pgstattuple
      regression test results are platform-independent (ie, BLCKSZ configuration
      independent), scale the index_size result in the regression tests.
      
      The documentation issue was reported by Otsuka Kenji, and the inconsistent
      root page counting by Peter Geoghegan; the other problems noted by me.
      Back-patch to all supported branches, because this has been broken for
      a long time.
      48e6c943
  17. 13 2月, 2016 1 次提交
  18. 10 2月, 2016 3 次提交
    • R
      postgres_fdw: Remove unnecessary variable. · 019e7881
      Robert Haas 提交于
      It causes warnings in non-Assert-enabled builds.
      
      Per report from Jeff Janes.
      019e7881
    • R
      postgres_fdw: Remove unstable regression test. · bb4df42e
      Robert Haas 提交于
      Per Tom Lane and the buildfarm.
      bb4df42e
    • R
      postgres_fdw: Push down joins to remote servers. · e4106b25
      Robert Haas 提交于
      If we've got a relatively straightforward join between two tables,
      this pushes that join down to the remote server instead of fetching
      the rows for each table and performing the join locally.  Some cases
      are not handled yet, such as SEMI and ANTI joins.  Also, we don't
      yet attempt to create presorted join paths or parameterized join
      paths even though these options do get tried for a base relation
      scan.  Nevertheless, this seems likely to be a very significant win
      in many practical cases.
      
      Shigeru Hanada and Ashutosh Bapat, reviewed by Robert Haas, with
      additional review at various points by Tom Lane, Etsuro Fujita,
      KaiGai Kohei, and Jeevan Chalke.
      e4106b25
  19. 09 2月, 2016 1 次提交
  20. 07 2月, 2016 1 次提交
  21. 05 2月, 2016 2 次提交