1. 19 3月, 2014 5 次提交
    • H
      Make the handling of interrupted B-tree page splits more robust. · 40dae7ec
      Heikki Linnakangas 提交于
      Splitting a page consists of two separate steps: splitting the child page,
      and inserting the downlink for the new right page to the parent. Previously,
      we handled the case that you crash in between those steps with a cleanup
      routine after the WAL recovery had finished, which finished the incomplete
      split. However, that doesn't help if the page split is interrupted but the
      database doesn't crash, so that you don't perform WAL recovery. That could
      happen for example if you run out of disk space.
      
      Remove the end-of-recovery cleanup step. Instead, when a page is split, the
      left page is marked with a new INCOMPLETE_SPLIT flag, and when the downlink
      is inserted to the parent, the flag is cleared again. If an insertion sees
      a page with the flag set, it knows that the split was interrupted for some
      reason, and inserts the missing downlink before proceeding.
      
      I used the same approach to fix GIN and GiST split algorithms earlier. This
      was the last WAL cleanup routine, so we could get rid of that whole
      machinery now, but I'll leave that for a separate patch.
      
      Reviewed by Peter Geoghegan.
      40dae7ec
    • T
      Fix some remaining int64 vestiges in contrib/test_shm_mq. · b6ec7c92
      Tom Lane 提交于
      Andres Freund and Tom Lane
      b6ec7c92
    • R
      test_shm_mq: Use Size rather than uint64. · c676ac0f
      Robert Haas 提交于
      Commit 3bd261ca updated the API but
      neglected to make the corresponding edits here.
      
      Per Tom Lane and the buildfarm.
      c676ac0f
    • R
      Documentation for logical decoding. · 49c0864d
      Robert Haas 提交于
      Craig Ringer, Andres Freund, Christian Kruse, with edits by me.
      49c0864d
    • R
      Add pg_recvlogical, a tool to receive data logical decoding data. · 8bdd12bb
      Robert Haas 提交于
      This is fairly basic at the moment, but it's at least useful for
      testing and debugging, and possibly more.
      
      Andres Freund
      8bdd12bb
  2. 18 3月, 2014 8 次提交
  3. 17 3月, 2014 9 次提交
    • H
      Fix thinko: have trueTriConsistentFn return GIN_TRUE. · d663d439
      Heikki Linnakangas 提交于
      While we're at it, also improve comments in ginlogic.c.
      d663d439
    • F
      Fix typos in comments. · 2bccced1
      Fujii Masao 提交于
      Thom Brown
      2bccced1
    • F
      Fix bug in clean shutdown of walsender that pg_receiving is connecting to. · 5c6d9fc4
      Fujii Masao 提交于
      On clean shutdown, walsender waits for all WAL to be replicated to a standby,
      and exits. It determined whether that replication had been completed by
      checking whether its sent location had been equal to a standby's flush
      location. Unfortunately this condition never becomes true when the standby
      such as pg_receivexlog which always returns an invalid flush location is
      connecting to walsender, and then walsender waits forever.
      
      This commit changes walsender so that it just checks a standby's write
      location if a flush location is invalid.
      
      Back-patch to 9.1 where enough infrastructure for this exists.
      5c6d9fc4
    • M
      Fix small typo in comment · 02703ff2
      Magnus Hagander 提交于
      Michael Paquier
      02703ff2
    • A
      plperl: Fix memory leak in hek2cstr · bd1154ed
      Alvaro Herrera 提交于
      Backpatch all the way back to 9.1, where it was introduced by commit
      50d89d42.
      
      Reported by Sergey Burladyan in #9223
      Author: Alex Hunsaker
      bd1154ed
    • T
      Fix unportable shell-script syntax in pg_upgrade's test.sh. · 0268d21e
      Tom Lane 提交于
      I discovered the hard way that on some old shells, the locution
          FOO=""   unset FOO
      does not behave the same as
          FOO="";  unset FOO
      and in fact leaves FOO set to an empty string.  test.sh was inconsistently
      spelling it different ways on adjacent lines.
      
      This got broken relatively recently, in commit c737a2e5, so the lack of
      field reports to date doesn't represent a lot of evidence that the problem
      is rare.
      0268d21e
    • P
      Make punctuation consistent · 2861e8e9
      Peter Eisentraut 提交于
      2861e8e9
    • P
      Fix whitespace · e2b95947
      Peter Eisentraut 提交于
      e2b95947
    • T
      Fix advertised dispsize for libpq's sslmode connection parameter. · f4051e36
      Tom Lane 提交于
      "8" was correct back when "disable" was the longest allowed value, but
      since "verify-full" was added, it should be "12".  Given the lack of
      complaints, I wouldn't be surprised if nobody is actually using these
      values ... but still, if they're in the API, they should be right.
      
      Noticed while pursuing a different problem.  It's been wrong for quite
      a long time, so back-patch to all supported branches.
      f4051e36
  4. 16 3月, 2014 3 次提交
    • M
      Cleanups from the remove-native-krb5 patch · 0294023a
      Magnus Hagander 提交于
      krb_srvname is actually not available anymore as a parameter server-side, since
      with gssapi we accept all principals in our keytab. It's still used in libpq for
      client side specification.
      
      In passing remove declaration of krb_server_hostname, where all the functionality
      was already removed.
      
      Noted by Stephen Frost, though a different solution than his suggestion
      0294023a
    • T
      First-draft release notes for 9.3.4. · e3c9f232
      Tom Lane 提交于
      As usual, the release notes for older branches will be made by cutting
      these down, but put them up for community review first.
      e3c9f232
    • T
      Update time zone data files to tzdata release 2014a. · aba7f567
      Tom Lane 提交于
      DST law changes in Fiji, Turkey; historical changes in Israel, Ukraine.
      aba7f567
  5. 14 3月, 2014 4 次提交
    • H
      Fix race condition in B-tree page deletion. · efada2b8
      Heikki Linnakangas 提交于
      In short, we don't allow a page to be deleted if it's the rightmost child
      of its parent, but that situation can change after we check for it.
      
      Problem
      -------
      
      We check that the page to be deleted is not the rightmost child of its
      parent, and then lock its left sibling, the page itself, its right sibling,
      and the parent, in that order. However, if the parent page is split after
      the check but before acquiring the locks, the target page might become the
      rightmost child, if the split happens at the right place. That leads to an
      error in vacuum (I reproduced this by setting a breakpoint in debugger):
      
      ERROR:  failed to delete rightmost child 41 of block 3 in index "foo_pkey"
      
      We currently re-check that the page is still the rightmost child, and throw
      the above error if it's not. We could easily just give up rather than throw
      an error, but that approach doesn't scale to half-dead pages. To recap,
      although we don't normally allow deleting the rightmost child, if the page
      is the *only* child of its parent, we delete the child page and mark the
      parent page as half-dead in one atomic operation. But before we do that, we
      check that the parent can later be deleted, by checking that it in turn is
      not the rightmost child of the grandparent (potentially recursing all the
      way up to the root). But the same situation can arise there - the
      grandparent can be split while we're not holding the locks. We end up with
      a half-dead page that we cannot delete.
      
      To make things worse, the keyspace of the deleted page has already been
      transferred to its right sibling. As the README points out, the keyspace at
      the grandparent level is "out-of-whack" until the half-dead page is deleted,
      and if enough tuples with keys in the transferred keyspace are inserted, the
      page might get split and a downlink might be inserted into the grandparent
      that is out-of-order. That might not cause any serious problem if it's
      transient (as the README ponders), but is surely bad if it stays that way.
      
      Solution
      --------
      
      This patch changes the page deletion algorithm to avoid that problem. After
      checking that the topmost page in the chain of to-be-deleted pages is not
      the rightmost child of its parent, and then deleting the pages from bottom
      up, unlink the pages from top to bottom. This way, the intermediate stages
      are similar to the intermediate stages in page splitting, and there is no
      transient stage where the keyspace is "out-of-whack". The topmost page in
      the to-be-deleted chain doesn't have a downlink pointing to it, like a page
      split before the downlink has been inserted.
      
      This also allows us to get rid of the cleanup step after WAL recovery, if we
      crash during page deletion. The deletion will be continued at next VACUUM,
      but the tree is consistent for searches and insertions at every step.
      
      This bug is old, all supported versions are affected, but this patch is too
      big to back-patch (and changes the WAL record formats of related records).
      We have not heard any reports of the bug from users, so clearly it's not
      easy to bump into. Maybe backpatch later, after this has had some field
      testing.
      
      Reviewed by Kevin Grittner and Peter Geoghegan.
      efada2b8
    • T
      Prevent interrupts while reporting non-ERROR elog messages. · 6c461cb9
      Tom Lane 提交于
      This should eliminate the risk of recursive entry to syslog(3), which
      appears to be the cause of the hang reported in bug #9551 from James
      Morton.
      
      Arguably, the real problem here is auth.c's willingness to turn on
      ImmediateInterruptOK while executing fairly wide swaths of backend code.
      We may well need to work at narrowing the code ranges in which the
      authentication_timeout interrupt is enabled.  For the moment, though,
      this is a cheap and reasonably noninvasive fix for a field-reported
      failure; the other approach would be complex and not necessarily
      bug-free itself.
      
      Back-patch to all supported branches.
      6c461cb9
    • T
      Allow psql to print COPY command status in more cases. · f70a78bc
      Tom Lane 提交于
      Previously, psql would print the "COPY nnn" command status only for COPY
      commands executed server-side.  Now it will print that for frontend copies
      too (including \copy).  However, we continue to suppress the command status
      for COPY TO STDOUT, since in that case the copy data has been routed to the
      same place that the command status would go, and there is a risk of the
      status line being mistaken for another line of COPY data.  Doing that would
      break existing scripts, and it doesn't seem worth the benefit --- this case
      seems fairly analogous to SELECT, for which we also suppress the command
      status.
      
      Kumar Rajeev Rastogi, with substantial review by Amit Khandekar
      f70a78bc
    • T
      Avoid transaction-commit race condition while receiving a NOTIFY message. · 7bae0284
      Tom Lane 提交于
      Use TransactionIdIsInProgress, then TransactionIdDidCommit, to distinguish
      whether a NOTIFY message's originating transaction is in progress,
      committed, or aborted.  The previous coding could accept a message from a
      transaction that was still in-progress according to the PGPROC array;
      if the client were fast enough at starting a new transaction, it might fail
      to see table rows added/updated by the message-sending transaction.  Which
      of course would usually be the point of receiving the message.  We noted
      this type of race condition long ago in tqual.c, but async.c overlooked it.
      
      The race condition probably cannot occur unless there are multiple NOTIFY
      senders in action, since an individual backend doesn't send NOTIFY signals
      until well after it's done committing.  But if two senders commit in close
      succession, it's certainly possible that we could see the second sender's
      message within the race condition window while responding to the signal
      from the first one.
      
      Per bug #9557 from Marko Tiikkaja.  This patch is slightly more invasive
      than what he proposed, since it removes the now-redundant
      TransactionIdDidAbort call.
      
      Back-patch to 9.0, where the current NOTIFY implementation was introduced.
      7bae0284
  6. 13 3月, 2014 9 次提交
  7. 12 3月, 2014 2 次提交
    • H
      Allow opclasses to provide tri-valued GIN consistent functions. · c5608ea2
      Heikki Linnakangas 提交于
      With the GIN "fast scan" feature, GIN can skip items without fetching all
      the keys for them, if it can prove that they don't match regardless of
      those keys. So far, it has done the proving by calling the boolean
      consistent function with all combinations of TRUE/FALSE for the unfetched
      keys, but since that's O(n^2), it becomes unfeasible with more than a few
      keys. We can avoid calling consistent with all the combinations, if we can
      tell the operator class implementation directly which keys are unknown.
      
      This commit includes a triConsistent function for the built-in array and
      tsvector opclasses.
      
      Alexander Korotkov, with some changes by me.
      c5608ea2
    • H
      In WAL replay, restore GIN metapage unconditionally to avoid torn page. · fecfc2b9
      Heikki Linnakangas 提交于
      We don't take a full-page image of the GIN metapage; instead, the WAL record
      contains all the information required to reconstruct it from scratch. But
      to avoid torn page hazards, we must re-initialize it from the WAL record
      every time, even if it already has a greater LSN, similar to how normal full
      page images are restored.
      
      This was highly unlikely to cause any problems in practice, because the GIN
      metapage is small. We rely on an update smaller than a 512 byte disk sector
      to be atomic elsewhere, at least in pg_control. But better safe than sorry,
      and this would be easy to overlook if more fields are added to the metapage
      so that it's no longer small.
      
      Reported by Noah Misch. Backpatch to all supported versions.
      fecfc2b9