1. 30 7月, 2009 1 次提交
    • T
      Support deferrable uniqueness constraints. · 25d9bf2e
      Tom Lane 提交于
      The current implementation fires an AFTER ROW trigger for each tuple that
      looks like it might be non-unique according to the index contents at the
      time of insertion.  This works well as long as there aren't many conflicts,
      but won't scale to massive unique-key reassignments.  Improving that case
      is a TODO item.
      
      Dean Rasheed
      25d9bf2e
  2. 11 6月, 2009 1 次提交
  3. 02 1月, 2009 1 次提交
  4. 31 12月, 2008 1 次提交
  5. 14 7月, 2008 1 次提交
    • T
      Clean up the use of some page-header-access macros: principally, use · 9d035f42
      Tom Lane 提交于
      SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that
      makes the code clearer, and avoid casting between Page and PageHeader where
      possible.  Zdenek Kotala, with some additional cleanup by Heikki Linnakangas.
      
      I did not apply the parts of the proposed patch that would have resulted in
      slightly changing the on-disk format of hash indexes; it seems to me that's
      not a win as long as there's any chance of having in-place upgrade for 8.4.
      9d035f42
  6. 19 6月, 2008 1 次提交
  7. 07 6月, 2008 1 次提交
  8. 17 4月, 2008 1 次提交
    • T
      Repair two places where SIGTERM exit could leave shared memory state · d1cbd26d
      Tom Lane 提交于
      corrupted.  (Neither is very important if SIGTERM is used to shut down the
      whole database cluster together, but there's a problem if someone tries to
      SIGTERM individual backends.)  To do this, introduce new infrastructure
      macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care
      of transiently pushing an on_shmem_exit cleanup hook.  Also use this method
      for createdb cleanup --- that wasn't a shared-memory-corruption problem,
      but SIGTERM abort of createdb could leave orphaned files lying around.
      
      Backpatch as far as 8.2.  The shmem corruption cases don't exist in 8.1,
      and the createdb usage doesn't seem important enough to risk backpatching
      further.
      d1cbd26d
  9. 11 4月, 2008 1 次提交
    • T
      Replace "amgetmulti" AM functions with "amgetbitmap", in which the whole · 4e82a954
      Tom Lane 提交于
      indexscan always occurs in one call, and the results are returned in a
      TIDBitmap instead of a limited-size array of TIDs.  This should improve
      speed a little by reducing AM entry/exit overhead, and it is necessary
      infrastructure if we are ever to support bitmap indexes.
      
      In an only slightly related change, add support for TIDBitmaps to preserve
      (somewhat lossily) the knowledge that particular TIDs reported by an index
      need to have their quals rechecked when the heap is visited.  This facility
      is not really used yet; we'll need to extend the forced-recheck feature to
      plain indexscans before it's useful, and that hasn't been coded yet.
      The intent is to use it to clean up 8.3's horrid @@@ kluge for text search
      with weighted queries.  There might be other uses in future, but that one
      alone is sufficient reason.
      
      Heikki Linnakangas, with some adjustments by me.
      4e82a954
  10. 02 1月, 2008 1 次提交
  11. 17 11月, 2007 1 次提交
  12. 16 11月, 2007 1 次提交
  13. 12 4月, 2007 1 次提交
  14. 10 4月, 2007 1 次提交
    • T
      Minor tweaking of index special-space definitions so that the various · 56218fbc
      Tom Lane 提交于
      index types can be reliably distinguished by examining the special space
      on an index page.  Per my earlier proposal, plus the realization that
      there's no need for btree's vacuum cycle ID to cycle through every possible
      16-bit value.  Restricting its range a little costs nearly nothing and
      eliminates the possibility of collisions.
      Memo to self: remember to make bitmap indexes play along with this scheme,
      assuming that patch ever gets accepted.
      56218fbc
  15. 08 2月, 2007 1 次提交
    • B
      Reduce WAL activity for page splits: · b79575ce
      Bruce Momjian 提交于
      > Currently, an index split writes all the data on the split page to
      > WAL. That's a lot of WAL traffic. The tuples that are copied to the
      > right page need to be WAL logged, but the tuples that stay on the
      > original page don't.
      
      Heikki Linnakangas
      b79575ce
  16. 05 2月, 2007 1 次提交
    • T
      Rename MaxTupleSize to MaxHeapTupleSize to clarify that it's not meant to · 23c4978e
      Tom Lane 提交于
      describe the maximum size of index tuples (which is typically AM-dependent
      anyway); and consequently remove the bogus deduction for "special space"
      that was built into it.
      
      Adjust TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE to avoid wasting two
      bytes per toast chunk, and to ensure that the calculation correctly tracks any
      future changes in page header size.  The computation had been inaccurate in a
      way that didn't cause any harm except space wastage, but future changes could
      have broken it more drastically.
      
      Fix the calculation of BTMaxItemSize, which was formerly computed as 1 byte
      more than it could safely be.  This didn't cause any harm in practice because
      it's only compared against maxalign'd lengths, but future changes in the size
      of page headers or btree special space could have exposed the problem.
      
      initdb forced because of change in TOAST_MAX_CHUNK_SIZE, which alters the
      storage of toast tables.
      23c4978e
  17. 21 1月, 2007 1 次提交
  18. 09 1月, 2007 1 次提交
    • T
      Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST · 44317582
      Tom Lane 提交于
      per-column options for btree indexes.  The planner's support for this is still
      pretty rudimentary; it does not yet know how to plan mergejoins with
      nondefault ordering options.  The documentation is pretty rudimentary, too.
      I'll work on improving that stuff later.
      
      Note incompatible change from prior behavior: ORDER BY ... USING will now be
      rejected if the operator is not a less-than or greater-than member of some
      btree opclass.  This prevents less-than-sane behavior if an operator that
      doesn't actually define a proper sort ordering is selected.
      44317582
  19. 06 1月, 2007 1 次提交
  20. 02 11月, 2006 1 次提交
    • T
      Fix "failed to re-find parent key" btree VACUUM failure by revising page · 70ce5c90
      Tom Lane 提交于
      deletion code to avoid the case where an upper-level btree page remains "half
      dead" for a significant period of time, and to block insertions into a key
      range that is in process of being re-assigned to the right sibling of the
      deleted page's parent.  This prevents the scenario reported by Ed L. wherein
      index keys could become out-of-order in the grandparent index level.
      
      Since this is a moderately invasive fix, I'm applying it only to HEAD.
      The bug exists back to 7.4, but the back branches will get a different patch.
      70ce5c90
  21. 04 10月, 2006 1 次提交
  22. 24 8月, 2006 1 次提交
  23. 08 8月, 2006 1 次提交
    • T
      Make recovery from WAL be restartable, by executing a checkpoint-like · e0028369
      Tom Lane 提交于
      operation every so often.  This improves the usefulness of PITR log
      shipping for hot standby: formerly, if the standby server crashed, it
      was necessary to restart it from the last base backup and replay all
      the WAL since then.  Now it will only need to reread about the same
      amount of WAL as the master server would.  The behavior might also
      come in handy during a long PITR replay sequence.  Simon Riggs,
      with some editorialization by Tom Lane.
      e0028369
  24. 26 7月, 2006 1 次提交
    • T
      Modify btree to delete known-dead index entries without an actual VACUUM. · e6284649
      Tom Lane 提交于
      When we are about to split an index page to do an insertion, first look
      to see if any entries marked LP_DELETE exist on the page, and if so remove
      them to try to make enough space for the desired insert.  This should reduce
      index bloat in heavily-updated tables, although of course you still need
      VACUUM eventually to clean up the heap.
      
      Junji Teramoto
      e6284649
  25. 12 7月, 2006 1 次提交
  26. 04 7月, 2006 1 次提交
    • T
      Code review for FILLFACTOR patch. Change WITH grammar as per earlier · b7b78d24
      Tom Lane 提交于
      discussion (including making def_arg allow reserved words), add missed
      opt_definition for UNIQUE case.  Put the reloptions support code in a less
      random place (I chose to make a new file access/common/reloptions.c).
      Eliminate header inclusion creep.  Make the index options functions safely
      user-callable (seems like client apps might like to be able to test validity
      of options before trying to make an index).  Reduce overhead for normal case
      with no options by allowing rd_options to be NULL.  Fix some unmaintainably
      klugy code, including getting rid of Natts_pg_class_fixed at long last.
      Some stylistic cleanup too, and pay attention to keeping comments in sync
      with code.
      
      Documentation still needs work, though I did fix the omissions in
      catalogs.sgml and indexam.sgml.
      b7b78d24
  27. 02 7月, 2006 1 次提交
  28. 08 5月, 2006 1 次提交
    • T
      Rewrite btree vacuuming to fold the former bulkdelete and cleanup operations · 5749f6ef
      Tom Lane 提交于
      into a single mostly-physical-order scan of the index.  This requires some
      ticklish interlocking considerations, but should create no material
      performance impact on normal index operations (at least given the
      already-committed changes to make scans work a page at a time).  VACUUM
      itself should get significantly faster in any index that's degenerated to a
      very nonlinear page order.  Also, we save one pass over the index entirely,
      except in the case where there were no deletions to do and so only one pass
      happened anyway.
      
      Original patch by Heikki Linnakangas, rework by Tom Lane.
      5749f6ef
  29. 07 5月, 2006 1 次提交
    • T
      Rewrite btree index scans to work a page at a time in all cases (both · 09cb5c0e
      Tom Lane 提交于
      btgettuple and btgetmulti).  This eliminates the problem of "re-finding" the
      exact stopping point, since the stopping point is effectively always a page
      boundary, and index items are never moved across pre-existing page boundaries.
      A small penalty is that the keys_are_unique optimization is effectively
      disabled (and, therefore, is removed in this patch), causing us to apply
      _bt_checkkeys() to at least one more tuple than necessary when looking up a
      unique key.  However, the advantages for non-unique cases seem great enough to
      accept this tradeoff.  Aside from simplifying and (sometimes) speeding up the
      indexscan code, this will allow us to reimplement btbulkdelete as a largely
      sequential scan instead of index-order traversal, thereby significantly
      reducing the cost of VACUUM.  Those changes will come in a separate patch.
      
      Original patch by Heikki Linnakangas, rework by Tom Lane.
      09cb5c0e
  30. 13 4月, 2006 1 次提交
    • T
      Fix an ancient oversight in btree xlog replay. When trying to determine if an · 49a7610c
      Tom Lane 提交于
      upper-level insertion completes a previously-seen split, we cannot simply grab
      the downlink block number out of the buffer, because the buffer could contain
      a later state of the page --- or perhaps the page doesn't even exist at all
      any more, due to relation truncation.  These possibilities have been masked up
      to now because the use of full_page_writes effectively ensured that no xlog
      replay routine ever actually saw a page state newer than its own change.
      Since we're deprecating full_page_writes in 8.1.*, there's no need to fix this
      in existing release branches, but we need a fix in HEAD if we want to have any
      hope of re-allowing full_page_writes.  Accordingly, adjust the contents of
      btree WAL records so that we can always get the downlink block number from the
      WAL record rather than having to depend on buffer contents.  Per report from
      Kevin Grittner and Peter Brant.
      
      Improve a few comments in related code while at it.
      49a7610c
  31. 01 4月, 2006 2 次提交
    • T
      Remove the 'slow' path for btree index build, which built the btree · 89bda95d
      Tom Lane 提交于
      incrementally by successive inserts rather than by sorting the data.
      We were only using the slow path during bootstrap, apparently because
      when first written it failed during bootstrap --- but it works fine now
      AFAICT.  Removing it saves a hundred or so lines of code and produces
      noticeably (~10%) smaller initial states of the system catalog indexes.
      While that won't make much difference for heavily-modified catalogs,
      for the more static ones there may be a useful long-term performance
      improvement.
      89bda95d
    • T
      Clean up WAL/buffer interactions as per my recent proposal. Get rid of the · a8b8f4db
      Tom Lane 提交于
      misleadingly-named WriteBuffer routine, and instead require routines that
      change buffer pages to call MarkBufferDirty (which does exactly what it says).
      We also require that they do so before calling XLogInsert; this takes care of
      the synchronization requirement documented in SyncOneBuffer.  Note that
      because bufmgr takes the buffer content lock (in shared mode) while writing
      out any buffer, it doesn't matter whether MarkBufferDirty is executed before
      the buffer content change is complete, so long as the content change is
      completed before releasing exclusive lock on the buffer.  So it's OK to set
      the dirtybit before we fill in the LSN.
      This eliminates the former kluge of needing to set the dirtybit in LockBuffer.
      Aside from making the code more transparent, we can also add some new
      debugging assertions, in particular that the caller of MarkBufferDirty must
      hold the buffer content lock, not merely a pin.
      a8b8f4db
  32. 24 3月, 2006 1 次提交
    • T
      Arrange to emit a description of the current XLOG record as error context · 0a202070
      Tom Lane 提交于
      when an error occurs during xlog replay.  Also, replace the former risky
      'write into a fixed-size buffer with no overflow detection' API for XLOG
      record description routines; use an expansible StringInfo instead.  (The
      latter accounts for most of the patch bulk.)
      
      Qingqing Zhou
      0a202070
  33. 05 3月, 2006 1 次提交
  34. 26 1月, 2006 1 次提交
  35. 24 1月, 2006 1 次提交
    • T
      Instead of using a numberOfRequiredKeys count to distinguish required · 7ccaf13a
      Tom Lane 提交于
      and non-required keys in a btree index scan, mark the required scankeys
      with private flag bits SK_BT_REQFWD and/or SK_BT_REQBKWD.  This seems
      at least marginally clearer to me, and it eliminates a wired-into-the-
      data-structure assumption that required keys are consecutive.  Even though
      that assumption will remain true for the foreseeable future, having it
      in there makes the code seem more complex than necessary.
      7ccaf13a
  36. 08 12月, 2005 1 次提交
  37. 07 11月, 2005 1 次提交
    • T
      Add defenses to btree and hash index AMs to do simple sanity checks · 766dc45d
      Tom Lane 提交于
      on every index page they read; in particular to catch the case of an
      all-zero page, which PageHeaderIsValid allows to pass.  It turns out
      hash already had this idea, but it was just Assert()ing things rather
      than doing a straight error check, and the Asserts were partially
      redundant with PageHeaderIsValid anyway.  Per recent failure example
      from Jim Nasby.  (gist still needs the same treatment.)
      766dc45d
  38. 15 10月, 2005 1 次提交
  39. 07 6月, 2005 1 次提交