1. 26 7月, 2012 1 次提交
    • T
      Fix longstanding crash-safety bug with newly-created-or-reset sequences. · 8468bcc8
      Tom Lane 提交于
      If a crash occurred immediately after the first nextval() call for a serial
      column, WAL replay would restore the sequence to a state in which it
      appeared that no nextval() had been done, thus allowing the first sequence
      value to be returned again by the next nextval() call; as reported in
      bug #6748 from Xiangming Mei.
      
      More generally, the problem would occur if an ALTER SEQUENCE was executed
      on a freshly created or reset sequence.  (The manifestation with serial
      columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
      to serial column creation.)  The cause is that sequence creation attempted
      to save one WAL entry by writing out a WAL record that made it appear that
      the first nextval() had already happened (viz, with is_called = true),
      while marking the sequence's in-database state with log_cnt = 1 to show
      that the first nextval() need not emit a WAL record.  However, ALTER
      SEQUENCE would emit a new WAL entry reflecting the actual in-database state
      (with is_called = false).  Then, nextval would allocate the first sequence
      value and set is_called = true, but it would trust the log_cnt value and
      not emit any WAL record.  A crash at this point would thus restore the
      sequence to its post-ALTER state, causing the next nextval() call to return
      the first sequence value again.
      
      To fix, get rid of the idea of logging an is_called status different from
      reality.  This means that the first nextval-driven WAL record will happen
      at the first nextval call not the second, but the marginal cost of that is
      pretty negligible.  In addition, make sure that ALTER SEQUENCE resets
      log_cnt to zero in any case where it touches sequence parameters that
      affect future nextval results.  This will result in some user-visible
      changes in the contents of a sequence's log_cnt column, as reflected in the
      patch's regression test changes; but no application should be depending on
      that anyway, since it was already true that log_cnt changes rather
      unpredictably depending on checkpoint timing.
      
      In addition, make some basically-cosmetic improvements to get rid of
      sequence.c's undesirable intimacy with page layout details.  It was always
      really trying to WAL-log the contents of the sequence tuple, so we should
      have it do that directly using a HeapTuple's t_data and t_len, rather than
      backing into it with some magic assumptions about where the tuple would be
      on the sequence's page.
      
      Back-patch to all supported branches.
      8468bcc8
  2. 21 7月, 2012 1 次提交
    • T
      Fix whole-row Var evaluation to cope with resjunk columns (again). · 038c36b6
      Tom Lane 提交于
      When a whole-row Var is reading the result of a subquery, we need it to
      ignore any "resjunk" columns that the subquery might have evaluated for
      GROUP BY or ORDER BY purposes.  We've hacked this area before, in commit
      68e40998, but that fix only covered
      whole-row Vars of named composite types, not those of RECORD type; and it
      was mighty klugy anyway, since it just assumed without checking that any
      extra columns in the result must be resjunk.  A proper fix requires getting
      hold of the subquery's targetlist so we can actually see which columns are
      resjunk (whereupon we can use a JunkFilter to get rid of them).  So bite
      the bullet and add some infrastructure to make that possible.
      
      Per report from Andrew Dunstan and additional testing by Merlin Moncure.
      Back-patch to all supported branches.  In 8.3, also back-patch commit
      292176a1, which for some reason I had
      not done at the time, but it's a prerequisite for this change.
      038c36b6
  3. 18 7月, 2012 1 次提交
    • T
      Improve coding around the fsync request queue. · 9c2626ac
      Tom Lane 提交于
      In all branches back to 8.3, this patch fixes a questionable assumption in
      CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there are
      no uninitialized pad bytes in the request queue structs.  This would only
      cause trouble if (a) there were such pad bytes, which could happen in 8.4
      and up if the compiler makes enum ForkNumber narrower than 32 bits, but
      otherwise would require not-currently-planned changes in the widths of
      other typedefs; and (b) the kernel has not uniformly initialized the
      contents of shared memory to zeroes.  Still, it seems a tad risky, and we
      can easily remove any risk by pre-zeroing the request array for ourselves.
      In addition to that, we need to establish a coding rule that struct
      RelFileNode can't contain any padding bytes, since such structs are copied
      into the request array verbatim.  (There are other places that are assuming
      this anyway, it turns out.)
      
      In 9.1 and up, the risk was a bit larger because we were also effectively
      assuming that struct RelFileNodeBackend contained no pad bytes, and with
      fields of different types in there, that would be much easier to break.
      However, there is no good reason to ever transmit fsync or delete requests
      for temp files to the bgwriter/checkpointer, so we can revert the request
      structs to plain RelFileNode, getting rid of the padding risk and saving
      some marginal number of bytes and cycles in fsync queue manipulation while
      we are at it.  The savings might be more than marginal during deletion of
      a temp relation, because the old code transmitted an entirely useless but
      nonetheless expensive-to-process ForgetRelationFsync request to the
      background process, and also had the background process perform the file
      deletion even though that can safely be done immediately.
      
      In addition, make some cleanup of nearby comments and small improvements to
      the code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.
      9c2626ac
  4. 16 7月, 2012 1 次提交
    • T
      Prevent corner-case core dump in rfree(). · b277a938
      Tom Lane 提交于
      rfree() failed to cope with the case that pg_regcomp() had initialized the
      regex_t struct but then failed to allocate any memory for re->re_guts (ie,
      the first malloc call in pg_regcomp() failed).  It would try to touch the
      guts struct anyway, and thus dump core.  This is a sufficiently narrow
      corner case that it's not surprising it's never been seen in the field;
      but still a bug is a bug, so patch all active branches.
      
      Noted while investigating whether we need to call pg_regfree after a
      failure return from pg_regcomp.  Other than this bug, it turns out we
      don't, so adjust comments appropriately.
      b277a938
  5. 11 7月, 2012 2 次提交
  6. 10 7月, 2012 1 次提交
    • T
      Refactor pattern_fixed_prefix() to avoid dealing in incomplete patterns. · 1590731e
      Tom Lane 提交于
      Previously, pattern_fixed_prefix() was defined to return whatever fixed
      prefix it could extract from the pattern, plus the "rest" of the pattern.
      That definition was sensible for LIKE patterns, but not so much for
      regexes, where reconstituting a valid pattern minus the prefix could be
      quite tricky (certainly the existing code wasn't doing that correctly).
      Since the only thing that callers ever did with the "rest" of the pattern
      was to pass it to like_selectivity() or regex_selectivity(), let's cut out
      the middle-man and just have pattern_fixed_prefix's subroutines do this
      directly.  Then pattern_fixed_prefix can return a simple selectivity
      number, and the question of how to cope with partial patterns is removed
      from its API specification.
      
      While at it, adjust the API spec so that callers who don't actually care
      about the pattern's selectivity (which is a lot of them) can pass NULL for
      the selectivity pointer to skip doing the work of computing a selectivity
      estimate.
      
      This patch is only an API refactoring that doesn't actually change any
      processing, other than allowing a little bit of useless work to be skipped.
      However, it's necessary infrastructure for my upcoming fix to regex prefix
      extraction, because after that change there won't be any simple way to
      identify the "rest" of the regex, not even to the low level of fidelity
      needed by regex_selectivity.  We can cope with that if regex_fixed_prefix
      and regex_selectivity communicate directly, but not if we have to work
      within the old API.  Hence, back-patch to all active branches.
      1590731e
  7. 06 7月, 2012 1 次提交
    • T
      Don't try to trim "../" in join_path_components(). · 79400281
      Tom Lane 提交于
      join_path_components() tried to remove leading ".." components from its
      tail argument, but it was not nearly bright enough to do so correctly
      unless the head argument was (a) absolute and (b) canonicalized.
      Rather than try to fix that logic, let's just get rid of it: there is no
      correctness reason to remove "..", and cosmetic concerns can be taken
      care of by a subsequent canonicalize_path() call.  Per bug #6715 from
      Greg Davidson.
      
      Back-patch to all supported branches.  It appears that pre-9.2, this
      function is only used with absolute paths as head arguments, which is why
      we'd not noticed the breakage before.  However, third-party code might be
      expecting this function to work in more general cases, so it seems wise
      to back-patch.
      
      In HEAD and 9.2, also make some minor cosmetic improvements to callers.
      79400281
  8. 01 7月, 2012 1 次提交
    • T
      Prevent CREATE TABLE LIKE/INHERITS from (mis) copying whole-row Vars. · 188a0a00
      Tom Lane 提交于
      If a CHECK constraint or index definition contained a whole-row Var (that
      is, "table.*"), an attempt to copy that definition via CREATE TABLE LIKE or
      table inheritance produced incorrect results: the copied Var still claimed
      to have the rowtype of the source table, rather than the created table.
      
      For the LIKE case, it seems reasonable to just throw error for this
      situation, since the point of LIKE is that the new table is not permanently
      coupled to the old, so there's no reason to assume its rowtype will stay
      compatible.  In the inheritance case, we should ideally allow such
      constraints, but doing so will require nontrivial refactoring of CREATE
      TABLE processing (because we'd need to know the OID of the new table's
      rowtype before we adjust inherited CHECK constraints).  In view of the lack
      of previous complaints, that doesn't seem worth the risk in a back-patched
      bug fix, so just make it throw error for the inheritance case as well.
      
      Along the way, replace change_varattnos_of_a_node() with a more robust
      function map_variable_attnos(), which is capable of being extended to
      handle insertion of ConvertRowtypeExpr whenever we get around to fixing
      the inheritance case nicely, and in the meantime it returns a failure
      indication to the caller so that a helpful message with some context can be
      thrown.  Also, this code will do the right thing with subselects (if we
      ever allow them in CHECK or indexes), and it range-checks varattnos before
      using them to index into the map array.
      
      Per report from Sergey Konoplev.  Back-patch to all supported branches.
      188a0a00
  9. 26 6月, 2012 1 次提交
    • R
      Backport fsync queue compaction logic to all supported branches. · ef0f9dde
      Robert Haas 提交于
      This backports commit 7f242d88,
      except for the counter in pg_stat_bgwriter.  The underlying problem
      (namely, that a full fsync request queue causes terrible checkpoint
      behavior) continues to be reported in the wild, and this code seems
      to be safe and robust enough to risk back-porting the fix.
      ef0f9dde
  10. 22 6月, 2012 1 次提交
    • T
      Fix memory leak in ARRAY(SELECT ...) subqueries. · b02cd9c7
      Tom Lane 提交于
      Repeated execution of an uncorrelated ARRAY_SUBLINK sub-select (which
      I think can only happen if the sub-select is embedded in a larger,
      correlated subquery) would leak memory for the duration of the query,
      due to not reclaiming the array generated in the previous execution.
      Per bug #6698 from Armando Miraglia.  Diagnosis and fix idea by Heikki,
      patch itself by me.
      
      This has been like this all along, so back-patch to all supported versions.
      b02cd9c7
  11. 20 6月, 2012 2 次提交
  12. 05 6月, 2012 1 次提交
    • T
      Fix some more bugs in contrib/xml2's xslt_process(). · 66fb03f5
      Tom Lane 提交于
      It failed to check for error return from xsltApplyStylesheet(), as reported
      by Peter Gagarinov.  (So far as I can tell, libxslt provides no convenient
      way to get a useful error message in failure cases.  There might be some
      inconvenient way, but considering that this code is deprecated it's hard to
      get enthusiastic about putting lots of work into it.  So I just made it say
      "failed to apply stylesheet", in line with the existing error checks.)
      
      While looking at the code I also noticed that the string returned by
      xsltSaveResultToString was never freed, resulting in a session-lifespan
      memory leak.
      
      Back-patch to all supported versions.
      66fb03f5
  13. 01 6月, 2012 4 次提交
  14. 31 5月, 2012 3 次提交
    • T
      Update time zone data files to tzdata release 2012c. · e4f08846
      Tom Lane 提交于
      DST law changes in Antarctica, Armenia, Chile, Cuba, Falkland Islands,
      Gaza, Haiti, Hebron, Morocco, Syria, Tokelau Islands.
      Historical corrections for Canada.
      e4f08846
    • T
      Ignore SECURITY DEFINER and SET attributes for a PL's call handler. · 8851d5e9
      Tom Lane 提交于
      It's not very sensible to set such attributes on a handler function;
      but if one were to do so, fmgr.c went into infinite recursion because
      it would call fmgr_security_definer instead of the handler function proper.
      There is no way for fmgr_security_definer to know that it ought to call the
      handler and not the original function referenced by the FmgrInfo's fn_oid,
      so it tries to do the latter, causing the whole process to start over
      again.
      
      Ordinarily such misconfiguration of a procedural language's handler could
      be written off as superuser error.  However, because we allow non-superuser
      database owners to create procedural languages and the handler for such a
      language becomes owned by the database owner, it is possible for a database
      owner to crash the backend, which ideally shouldn't be possible without
      superuser privileges.  In 9.2 and up we will adjust things so that the
      handler functions are always owned by superusers, but in existing branches
      this is a minor security fix.
      
      Problem noted by Noah Misch (after several of us had failed to detect
      it :-().  This is CVE-2012-2655.
      8851d5e9
    • T
      Expand the allowed range of timezone offsets to +/-15:59:59 from Greenwich. · 4d3482a7
      Tom Lane 提交于
      We used to only allow offsets less than +/-13 hours, then it was +/14,
      then it was +/-15.  That's still not good enough though, as per today's bug
      report from Patric Bechtel.  This time I actually looked through the Olson
      timezone database to find the largest offsets used anywhere.  The winners
      are Asia/Manila, at -15:56:00 until 1844, and America/Metlakatla, at
      +15:13:42 until 1867.  So we'd better allow offsets less than +/-16 hours.
      
      Given the history, we are way overdue to have some greppable #define
      symbols controlling this, so make some ... and also remove an obsolete
      comment that didn't get fixed the last time.
      
      Back-patch to all supported branches.
      4d3482a7
  15. 30 5月, 2012 1 次提交
    • T
      Fix incorrect password transformation in contrib/pgcrypto's DES crypt(). · dd957a5b
      Tom Lane 提交于
      Overly tight coding caused the password transformation loop to stop
      examining input once it had processed a byte equal to 0x80.  Thus, if the
      given password string contained such a byte (which is possible though not
      highly likely in UTF8, and perhaps also in other non-ASCII encodings), all
      subsequent characters would not contribute to the hash, making the password
      much weaker than it appears on the surface.
      
      This would only affect cases where applications used DES crypt() to encode
      passwords before storing them in the database.  If a weak password has been
      created in this fashion, the hash will stop matching after this update has
      been applied, so it will be easy to tell if any passwords were unexpectedly
      weak.  Changing to a different password would be a good idea in such a case.
      (Since DES has been considered inadequately secure for some time, changing
      to a different encryption algorithm can also be recommended.)
      
      This code, and the bug, are shared with at least PHP, FreeBSD, and OpenBSD.
      Since the other projects have already published their fixes, there is no
      point in trying to keep this commit private.
      
      This bug has been assigned CVE-2012-2143, and credit for its discovery goes
      to Rubin Xu and Joseph Bonneau.
      dd957a5b
  16. 29 5月, 2012 1 次提交
    • T
      Teach AbortOutOfAnyTransaction to clean up partially-started transactions. · b3d9db46
      Tom Lane 提交于
      AbortOutOfAnyTransaction failed to do anything if the state it saw on
      entry corresponded to failing partway through StartTransaction.  I fixed
      AbortCurrentTransaction to cope with that case way back in commit
      60b2444c, but evidently overlooked that
      AbortOutOfAnyTransaction should do likewise.
      
      Back-patch to all supported branches.  It's not clear that this omission
      has any more-than-cosmetic consequences, but it's also not clear that it
      doesn't, so back-patching seems the least risky choice.
      b3d9db46
  17. 27 5月, 2012 1 次提交
    • T
      Prevent synchronized scanning when systable_beginscan chooses a heapscan. · 422022b1
      Tom Lane 提交于
      The only interesting-for-performance case wherein we force heapscan here
      is when we're rebuilding the relcache init file, and the only such case
      that is likely to be examining a catalog big enough to be syncscanned is
      RelationBuildTupleDesc.  But the early-exit optimization in that code gets
      broken if we start the scan at a random place within the catalog, so that
      allowing syncscan is actually a big deoptimization if pg_attribute is large
      (at least for the normal case where the rows for core system catalogs have
      never been changed since initdb).  Hence, prevent syncscan here.  Per my
      testing pursuant to complaints from Jeff Frost and Greg Sabino Mullane,
      though neither of them seem to have actually hit this specific problem.
      
      Back-patch to 8.3, where syncscan was introduced.
      422022b1
  18. 26 5月, 2012 2 次提交
    • T
      Fix string truncation to be multibyte-aware in text_name and bpchar_name. · 6f163609
      Tom Lane 提交于
      Previously, casts to name could generate invalidly-encoded results.
      
      Also, make these functions match namein() more exactly, by consistently
      using palloc0() instead of ad-hoc zeroing code.
      
      Back-patch to all supported branches.
      
      Karl Schnaitter and Tom Lane
      6f163609
    • T
      Use binary search instead of brute-force scan in findNamespace(). · bd43c50a
      Tom Lane 提交于
      The previous coding presented a significant bottleneck when dumping
      databases containing many thousands of schemas, since the total time
      spent searching would increase roughly as O(N^2) in the number of objects.
      Noted by Jeff Janes, though I rewrote his proposed patch to use the
      existing findObjectByOid infrastructure.
      
      Since this is a longstanding performance bug, backpatch to all supported
      versions.
      bd43c50a
  19. 23 5月, 2012 1 次提交
    • T
      Ensure that seqscans check for interrupts at least once per page. · c994b921
      Tom Lane 提交于
      If a seqscan encounters many consecutive pages containing only dead tuples,
      it can remain in the loop in heapgettup for a long time, and there was no
      CHECK_FOR_INTERRUPTS anywhere in that loop.  This meant there were
      real-world situations where a query would be effectively uncancelable for
      long stretches.  Add a check placed to occur once per page, which should be
      enough to provide reasonable response time without adding any measurable
      overhead.
      
      Report and patch by Merlin Moncure (though I tweaked it a bit).
      Back-patch to all supported branches.
      c994b921
  20. 16 5月, 2012 1 次提交
  21. 11 5月, 2012 1 次提交
    • T
      Fix Windows implementation of PGSemaphoreLock. · fcc0ba31
      Tom Lane 提交于
      The original coding failed to reset ImmediateInterruptOK before returning,
      which would potentially allow a subsequent query-cancel interrupt to be
      accepted at an unsafe point.  This is a really nasty bug since it's so hard
      to predict the consequences, but they could be unpleasant.
      
      Also, ensure that signal handlers are serviced before this function
      returns, even if the semaphore is already set.  This should make the
      behavior more like Unix.
      
      Back-patch to all supported versions.
      fcc0ba31
  22. 03 5月, 2012 1 次提交
  23. 28 4月, 2012 1 次提交
    • T
      Fix printing of whole-row Vars at top level of a SELECT targetlist. · 092d1d9d
      Tom Lane 提交于
      Normally whole-row Vars are printed as "tabname.*".  However, that does not
      work at top level of a targetlist, because per SQL standard the parser will
      think that the "*" should result in column-by-column expansion; which is
      not at all what a whole-row Var implies.  We used to just print the table
      name in such cases, which works most of the time; but it fails if the table
      name matches a column name available anywhere in the FROM clause.  This
      could lead for instance to a view being interpreted differently after dump
      and reload.  Adding parentheses doesn't fix it, but there is a reasonably
      simple kluge we can use instead: attach a no-op cast, so that the "*" isn't
      syntactically at top level anymore.  This makes the printing of such
      whole-row Vars a lot more consistent with other Vars, and may indeed fix
      more cases than just the reported one; I'm suspicious that cases involving
      schema qualification probably didn't work properly before, either.
      
      Per bug report and fix proposal from Abbas Butt, though this patch is quite
      different in detail from his.
      
      Back-patch to all supported versions.
      092d1d9d
  24. 27 4月, 2012 1 次提交
    • T
      Fix syslogger's rotation disable/re-enable logic. · f8d7f9ad
      Tom Lane 提交于
      If it fails to open a new log file, the syslogger assumes there's something
      wrong with its parameters (such as log_directory), and stops attempting
      automatic time-based or size-based log file rotations.  Sending it SIGHUP
      is supposed to start that up again.  However, the original coding for that
      was really bogus, involving clobbering a couple of GUC variables and hoping
      that SIGHUP processing would restore them.  Get rid of that technique in
      favor of maintaining a separate flag showing we've turned rotation off.
      Per report from Mark Kirkwood.
      
      Also, the syslogger will automatically attempt to create the log_directory
      directory if it doesn't exist, but that was only happening at startup.
      For consistency and ease of use, it should do the same whenever the value
      of log_directory is changed by SIGHUP.
      
      Back-patch to all supported branches.
      f8d7f9ad
  25. 26 4月, 2012 1 次提交
    • T
      Fix edge-case behavior of pg_next_dst_boundary(). · 17fc5db7
      Tom Lane 提交于
      Due to rather sloppy thinking (on my part, I'm afraid) about the
      appropriate behavior for boundary conditions, pg_next_dst_boundary() gave
      undefined, platform-dependent results when the input time is exactly the
      last recorded DST transition time for the specified time zone, as a result
      of fetching values one past the end of its data arrays.
      
      Change its specification to be that it always finds the next DST boundary
      *after* the input time, and adjust code to match that.  The sole existing
      caller, DetermineTimeZoneOffset, doesn't actually care about this
      distinction, since it always uses a probe time earlier than the instant
      that it does care about.  So it seemed best to me to change the API to make
      the result=1 and result=0 cases more consistent, specifically to ensure
      that the "before" outputs always describe the state at the given time,
      rather than hacking the code to obey the previous API comment exactly.
      
      Per bug #6605 from Sergey Burladyan.  Back-patch to all supported versions.
      17fc5db7
  26. 18 4月, 2012 3 次提交
  27. 12 4月, 2012 1 次提交
    • T
      Clamp indexscan filter condition cost estimate to be not less than zero. · 67a48385
      Tom Lane 提交于
      cost_index tries to estimate the per-tuple costs of evaluating filter
      conditions (a/k/a qpquals) by subtracting the estimated cost of the
      indexqual conditions from that of the baserestrictinfo conditions.  This is
      correct so long as the indexquals list is a subset of the baserestrictinfo
      list.  However, in the presence of derived indexable conditions it's
      completely wrong, leading to bogus or even negative scan cost estimates,
      as seen for example in bug #6579 from Istvan Endredy.  In practice the
      problem isn't severe except in the specific case of a LIKE optimization on
      a functional index containing a very expensive function.
      
      A proper fix for this might change cost estimates by more than people would
      like for stable branches, so in the back branches let's just clamp the cost
      difference to be not less than zero.  That will at least prevent completely
      insane behavior, while not changing the results normally.
      67a48385
  28. 09 4月, 2012 3 次提交
    • T
      Fix an Assert that turns out to be reachable after all. · 454c7fb3
      Tom Lane 提交于
      estimate_num_groups() gets unhappy with
      	create table empty();
      	select * from empty except select * from empty e2;
      I can't see any actual use-case for such a query (and the table is illegal
      per SQL spec), but it seems like a good idea that it not cause an assert
      failure.
      454c7fb3
    • H
      set_stack_base() no longer needs to be called in PostgresMain. · 2f8659b0
      Heikki Linnakangas 提交于
      This was a thinko in previous commit. Now that stack base pointer is now set
      in PostmasterMain and SubPostmasterMain, it doesn't need to be set in
      PostgresMain anymore.
      2f8659b0
    • H
      Do stack-depth checking in all postmaster children. · ddeac5de
      Heikki Linnakangas 提交于
      We used to only initialize the stack base pointer when starting up a regular
      backend, not in other processes. In particular, autovacuum workers can run
      arbitrary user code, and without stack-depth checking, infinite recursion
      in e.g an index expression will bring down the whole cluster.
      
      The comment about PL/Java using set_stack_base() is not yet true. As the
      code stands, PL/java still modifies the stack_base_ptr variable directly.
      However, it's been discussed in the PL/Java mailing list that it should be
      changed to use the function, because PL/Java is currently oblivious to the
      register stack used on Itanium. There's another issues with PL/Java, namely
      that the stack base pointer it sets is not really the base of the stack, it
      could be something close to the bottom of the stack. That's a separate issue
      that might need some further changes to this code, but that's a different
      story.
      
      Backpatch to all supported releases.
      ddeac5de