1. 30 11月, 2010 1 次提交
    • T
      Simplify and speed up mapping of index opfamilies to pathkeys. · c0b5fac7
      Tom Lane 提交于
      Formerly we looked up the operators associated with each index (caching
      them in relcache) and then the planner looked up the btree opfamily
      containing such operators in order to build the btree-centric pathkey
      representation that describes the index's sort order.  This is quite
      pointless for btree indexes: we might as well just use the index's opfamily
      information directly.  That saves syscache lookup cycles during planning,
      and furthermore allows us to eliminate the relcache's caching of operators
      altogether, which may help in reducing backend startup time.
      
      I added code to plancat.c to perform the same type of double lookup
      on-the-fly if it's ever faced with a non-btree amcanorder index AM.
      If such a thing actually becomes interesting for production, we should
      replace that logic with some more-direct method for identifying the
      corresponding btree opfamily; but it's not worth spending effort on now.
      
      There is considerably more to do pursuant to my recent proposal to get rid
      of sort-operator-based representations of sort orderings, but this patch
      grabs some of the low-hanging fruit.  I'll look at the remainder of that
      work after the current commitfest.
      c0b5fac7
  2. 11 10月, 2010 1 次提交
    • T
      Support triggers on views. · 2ec993a7
      Tom Lane 提交于
      This patch adds the SQL-standard concept of an INSTEAD OF trigger, which
      is fired instead of performing a physical insert/update/delete.  The
      trigger function is passed the entire old and/or new rows of the view,
      and must figure out what to do to the underlying tables to implement
      the update.  So this feature can be used to implement updatable views
      using trigger programming style rather than rule hacking.
      
      In passing, this patch corrects the names of some columns in the
      information_schema.triggers view.  It seems the SQL committee renamed
      them somewhere between SQL:99 and SQL:2003.
      
      Dean Rasheed, reviewed by Bernd Helmle; some additional hacking by me.
      2ec993a7
  3. 21 9月, 2010 1 次提交
  4. 14 8月, 2010 1 次提交
    • R
      Include the backend ID in the relpath of temporary relations. · debcec7d
      Robert Haas 提交于
      This allows us to reliably remove all leftover temporary relation
      files on cluster startup without reference to system catalogs or WAL;
      therefore, we no longer include temporary relations in XLOG_XACT_COMMIT
      and XLOG_XACT_ABORT WAL records.
      
      Since these changes require including a backend ID in each
      SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id
      field has been reduced from two bytes to one, and the maximum number
      of connections has been reduced from INT_MAX / 4 to 2^23-1.  It would
      be possible to remove these restrictions by increasing the size of
      SharedInvalidationMessage by 4 bytes, but right now that doesn't seem
      like a good trade-off.
      
      Review by Jaime Casanova and Tom Lane.
      debcec7d
  5. 26 2月, 2010 1 次提交
  6. 10 2月, 2010 1 次提交
    • T
      Fix up rickety handling of relation-truncation interlocks. · cbe9d6be
      Tom Lane 提交于
      Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr
      relation entries, so that they will get reset to InvalidBlockNumber whenever
      an smgr-level flush happens.  Because we now send smgr invalidation messages
      immediately (not at end of transaction) when a relation truncation occurs,
      this ensures that other backends will reset their values before they next
      access the relation.  We no longer need the unreliable assumption that a
      VACUUM that's doing a truncation will hold its AccessExclusive lock until
      commit --- in fact, we can intentionally release that lock as soon as we've
      completed the truncation.  This patch therefore reverts (most of) Alvaro's
      patch of 2009-11-10, as well as my marginal hacking on it yesterday.  We can
      also get rid of assorted no-longer-needed relcache flushes, which are far more
      expensive than an smgr flush because they kill a lot more state.
      
      In passing this patch fixes smgr_redo's failure to perform visibility-map
      truncation, and cleans up some rather dubious assumptions in freespace.c and
      visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of
      date.
      cbe9d6be
  7. 08 2月, 2010 1 次提交
    • T
      Create a "relation mapping" infrastructure to support changing the relfilenodes · b9b8831a
      Tom Lane 提交于
      of shared or nailed system catalogs.  This has two key benefits:
      
      * The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.
      
      * We no longer have to use an unsafe reindex-in-place approach for reindexing
        shared catalogs.
      
      CLUSTER on nailed catalogs now works too, although I left it disabled on
      shared catalogs because the resulting pg_index.indisclustered update would
      only be visible in one database.
      
      Since reindexing shared system catalogs is now fully transactional and
      crash-safe, the former special cases in REINDEX behavior have been removed;
      shared catalogs are treated the same as non-shared.
      
      This commit does not do anything about the recently-discussed problem of
      deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
      concurrent queries; will address that in a separate patch.  As a stopgap,
      parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
      such failures during the regression tests.
      b9b8831a
  8. 04 2月, 2010 1 次提交
    • T
      Restructure CLUSTER/newstyle VACUUM FULL/ALTER TABLE support so that swapping · 9727c583
      Tom Lane 提交于
      of old and new toast tables can be done either at the logical level (by
      swapping the heaps' reltoastrelid links) or at the physical level (by swapping
      the relfilenodes of the toast tables and their indexes).  This is necessary
      infrastructure for upcoming changes to support CLUSTER/VAC FULL on shared
      system catalogs, where we cannot change reltoastrelid.  The physical swap
      saves a few catalog updates too.
      
      We unfortunately have to keep the logical-level swap logic because in some
      cases we will be adding or deleting a toast table, so there's no possibility
      of a physical swap.  However, that only happens as a consequence of schema
      changes in the table, which we do not need to support for system catalogs,
      so such cases aren't an obstacle for that.
      
      In passing, refactor the cluster support functions a little bit to eliminate
      unnecessarily-duplicated code; and fix the problem that while CLUSTER had
      been taught to rename the final toast table at need, ALTER TABLE had not.
      9727c583
  9. 18 1月, 2010 1 次提交
    • T
      Improve the handling of SET CONSTRAINTS commands by having them search · 9a915e59
      Tom Lane 提交于
      pg_constraint before searching pg_trigger.  This allows saner handling of
      corner cases; in particular we now say "constraint is not deferrable"
      rather than "constraint does not exist" when the command is applied to
      a constraint that's inherently non-deferrable.  Per a gripe several months
      ago from hubert depesz lubaczewski.
      
      To make this work without breaking user-defined constraint triggers,
      we have to add entries for them to pg_constraint.  However, in return
      we can remove the pgconstrname column from pg_constraint, which represents
      a fairly sizable space savings.  I also replaced the tgisconstraint column
      with tgisinternal; the old meaning of tgisconstraint can now be had by
      testing for nonzero tgconstraint, while there is no other way to get
      the old meaning of nonzero tgconstraint, namely that the trigger was
      internally generated rather than being user-created.
      
      In passing, fix an old misstatement in the docs and comments, namely that
      pg_trigger.tgdeferrable is exactly redundant with pg_constraint.condeferrable.
      Actually, we mark RI action triggers as nondeferrable even when they belong to
      a nominally deferrable FK constraint.  The SET CONSTRAINTS code now relies on
      that instead of hard-coding a list of exception OIDs.
      9a915e59
  10. 11 1月, 2010 1 次提交
  11. 03 1月, 2010 1 次提交
  12. 07 12月, 2009 1 次提交
  13. 21 11月, 2009 1 次提交
    • T
      Add a WHEN clause to CREATE TRIGGER, allowing a boolean expression to be · 7fc0f062
      Tom Lane 提交于
      checked to determine whether the trigger should be fired.
      
      For BEFORE triggers this is mostly a matter of spec compliance; but for AFTER
      triggers it can provide a noticeable performance improvement, since queuing of
      a deferred trigger event and re-fetching of the row(s) at end of statement can
      be short-circuited if the trigger does not need to be fired.
      
      Takahiro Itagaki, reviewed by KaiGai Kohei.
      7fc0f062
  14. 28 7月, 2009 1 次提交
    • T
      Add system catalog columns pg_constraint.conindid and pg_trigger.tgconstrindid. · c1b9ec24
      Tom Lane 提交于
      conindid is the index supporting a constraint.  We can use this not only for
      unique/primary-key constraints, but also foreign-key constraints, which
      depend on the unique index that constrains the referenced columns.
      tgconstrindid is just copied from the constraint's conindid field, or is
      zero for triggers not associated with constraints.
      
      This is mainly intended as infrastructure for upcoming patches, but it has
      some virtue in itself, since it exposes a relationship that you formerly
      had to grovel in pg_depend to determine.  I simplified one information_schema
      view accordingly.  (There is a pg_dump query that could also use conindid,
      but I left it alone because it wasn't clear it'd get any faster.)
      c1b9ec24
  15. 11 6月, 2009 1 次提交
  16. 01 4月, 2009 1 次提交
    • T
      Modify the relcache to record the temp status of both local and nonlocal · 948d6ec9
      Tom Lane 提交于
      temp relations; this is no more expensive than before, now that we have
      pg_class.relistemp.  Insert tests into bufmgr.c to prevent attempting
      to fetch pages from nonlocal temp relations.  This provides a low-level
      defense against bugs-of-omission allowing temp pages to be loaded into shared
      buffers, as in the contrib/pgstattuple problem reported by Stuart Bishop.
      While at it, tweak a bunch of places to use new relcache tests (instead of
      expensive probes into pg_namespace) to detect local or nonlocal temp tables.
      948d6ec9
  17. 10 2月, 2009 1 次提交
  18. 02 1月, 2009 1 次提交
  19. 03 12月, 2008 1 次提交
    • H
      Introduce visibility map. The visibility map is a bitmap with one bit per · 608195a3
      Heikki Linnakangas 提交于
      heap page, where a set bit indicates that all tuples on the page are
      visible to all transactions, and the page therefore doesn't need
      vacuuming. It is stored in a new relation fork.
      
      Lazy vacuum uses the visibility map to skip pages that don't need
      vacuuming. Vacuum is also responsible for setting the bits in the map.
      In the future, this can hopefully be used to implement index-only-scans,
      but we can't currently guarantee that the visibility map is always 100%
      up-to-date.
      
      In addition to the visibility map, there's a new PD_ALL_VISIBLE flag on
      each heap page, also indicating that all tuples on the page are visible to
      all transactions. It's important that this flag is kept up-to-date. It
      is also used to skip visibility tests in sequential scans, which gives a
      small performance gain on seqscans.
      608195a3
  20. 27 11月, 2008 1 次提交
  21. 30 9月, 2008 1 次提交
    • H
      Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the · 15c121b3
      Heikki Linnakangas 提交于
      free space information is stored in a dedicated FSM relation fork, with each
      relation (except for hash indexes; they don't use FSM).
      
      This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
      trace of them from the backend, initdb, and documentation.
      
      Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
      introduce a new variant of the get_raw_page(regclass, int4, int4) function in
      contrib/pageinspect that let's you to return pages from any relation fork, and
      a new fsm_page_contents() function to inspect the new FSM pages.
      15c121b3
  22. 19 6月, 2008 1 次提交
  23. 11 4月, 2008 1 次提交
    • T
      Replace "amgetmulti" AM functions with "amgetbitmap", in which the whole · 4e82a954
      Tom Lane 提交于
      indexscan always occurs in one call, and the results are returned in a
      TIDBitmap instead of a limited-size array of TIDs.  This should improve
      speed a little by reducing AM entry/exit overhead, and it is necessary
      infrastructure if we are ever to support bitmap indexes.
      
      In an only slightly related change, add support for TIDBitmaps to preserve
      (somewhat lossily) the knowledge that particular TIDs reported by an index
      need to have their quals rechecked when the heap is visited.  This facility
      is not really used yet; we'll need to extend the forced-recheck feature to
      plain indexscans before it's useful, and that hasn't been coded yet.
      The intent is to use it to clean up 8.3's horrid @@@ kluge for text search
      with weighted queries.  There might be other uses in future, but that one
      alone is sufficient reason.
      
      Heikki Linnakangas, with some adjustments by me.
      4e82a954
  24. 28 3月, 2008 1 次提交
  25. 02 1月, 2008 1 次提交
  26. 16 11月, 2007 1 次提交
  27. 21 9月, 2007 1 次提交
    • T
      HOT updates. When we update a tuple without changing any of its indexed · 282d2a03
      Tom Lane 提交于
      columns, and the new version can be stored on the same heap page, we no longer
      generate extra index entries for the new version.  Instead, index searches
      follow the HOT-chain links to ensure they find the correct tuple version.
      
      In addition, this patch introduces the ability to "prune" dead tuples on a
      per-page basis, without having to do a complete VACUUM pass to recover space.
      VACUUM is still needed to clean up dead index entries, however.
      
      Pavan Deolasee, with help from a bunch of other people.
      282d2a03
  28. 27 5月, 2007 1 次提交
    • T
      Fix up pgstats counting of live and dead tuples to recognize that committed · 77947c51
      Tom Lane 提交于
      and aborted transactions have different effects; also teach it not to assume
      that prepared transactions are always committed.
      
      Along the way, simplify the pgstats API by tying counting directly to
      Relations; I cannot detect any redeeming social value in having stats
      pointers in HeapScanDesc and IndexScanDesc structures.  And fix a few
      corner cases in which counts might be missed because the relation's
      pgstat_info pointer hadn't been set.
      77947c51
  29. 29 3月, 2007 1 次提交
  30. 20 3月, 2007 1 次提交
    • J
      Changes pg_trigger and extend pg_rewrite in order to allow triggers and · 0fe16500
      Jan Wieck 提交于
      rules to be defined with different, per session controllable, behaviors
      for replication purposes.
      
      This will allow replication systems like Slony-I and, as has been stated
      on pgsql-hackers, other products to control the firing mechanism of
      triggers and rewrite rules without modifying the system catalog directly.
      
      The firing mechanisms are controlled by a new superuser-only GUC
      variable, session_replication_role, together with a change to
      pg_trigger.tgenabled and a new column pg_rewrite.ev_enabled. Both
      columns are a single char data type now (tgenabled was a bool before).
      The possible values in these attributes are:
      
           'O' - Trigger/Rule fires when session_replication_role is "origin"
                 (default) or "local". This is the default behavior.
      
           'D' - Trigger/Rule is disabled and fires never
      
           'A' - Trigger/Rule fires always regardless of the setting of
                 session_replication_role
      
           'R' - Trigger/Rule fires when session_replication_role is "replica"
      
      The GUC variable can only be changed as long as the system does not have
      any cached query plans. This will prevent changing the session role and
      accidentally executing stored procedures or functions that have plans
      cached that expand to the wrong query set due to differences in the rule
      firing semantics.
      
      The SQL syntax for changing a triggers/rules firing semantics is
      
           ALTER TABLE <tabname> <when> TRIGGER|RULE <name>;
      
           <when> ::= ENABLE | ENABLE ALWAYS | ENABLE REPLICA | DISABLE
      
      psql's \d command as well as pg_dump are extended in a backward
      compatible fashion.
      
      Jan
      0fe16500
  31. 28 2月, 2007 1 次提交
    • T
      Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len). · 234a02b2
      Tom Lane 提交于
      Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
      VARSIZE and VARDATA, and as a consequence almost no code was using the
      longer names.  Rename the length fields of struct varlena and various
      derived structures to catch anyplace that was accessing them directly;
      and clean up various places so caught.  In itself this patch doesn't
      change any behavior at all, but it is necessary infrastructure if we hope
      to play any games with the representation of varlena headers.
      Greg Stark and Tom Lane
      234a02b2
  32. 14 2月, 2007 1 次提交
    • T
      Fix up foreign-key mechanism so that there is a sound semantic basis for the · 7bddca34
      Tom Lane 提交于
      equality checks it applies, instead of a random dependence on whatever
      operators might be named "=".  The equality operators will now be selected
      from the opfamily of the unique index that the FK constraint depends on to
      enforce uniqueness of the referenced columns; therefore they are certain to be
      consistent with that index's notion of equality.  Among other things this
      should fix the problem noted awhile back that pg_dump may fail for foreign-key
      constraints on user-defined types when the required operators aren't in the
      search path.  This also means that the former warning condition about "foreign
      key constraint will require costly sequential scans" is gone: if the
      comparison condition isn't indexable then we'll reject the constraint
      entirely. All per past discussions.
      
      Along the way, make the RI triggers look into pg_constraint for their
      information, instead of using pg_trigger.tgargs; and get rid of the always
      error-prone fixed-size string buffers in ri_triggers.c in favor of building up
      the RI queries in StringInfo buffers.
      
      initdb forced due to columns added to pg_constraint and pg_trigger.
      7bddca34
  33. 25 1月, 2007 1 次提交
  34. 09 1月, 2007 1 次提交
    • T
      Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST · 44317582
      Tom Lane 提交于
      per-column options for btree indexes.  The planner's support for this is still
      pretty rudimentary; it does not yet know how to plan mergejoins with
      nondefault ordering options.  The documentation is pretty rudimentary, too.
      I'll work on improving that stuff later.
      
      Note incompatible change from prior behavior: ORDER BY ... USING will now be
      rejected if the operator is not a less-than or greater-than member of some
      btree opclass.  This prevents less-than-sane behavior if an operator that
      doesn't actually define a proper sort ordering is selected.
      44317582
  35. 06 1月, 2007 1 次提交
  36. 23 12月, 2006 1 次提交
    • T
      Restructure operator classes to allow improved handling of cross-data-type · a78fcfb5
      Tom Lane 提交于
      cases.  Operator classes now exist within "operator families".  While most
      families are equivalent to a single class, related classes can be grouped
      into one family to represent the fact that they are semantically compatible.
      Cross-type operators are now naturally adjunct parts of a family, without
      having to wedge them into a particular opclass as we had done originally.
      
      This commit restructures the catalogs and cleans up enough of the fallout so
      that everything still works at least as well as before, but most of the work
      needed to actually improve the planner's behavior will come later.  Also,
      there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way
      to create a new family right now is to allow CREATE OPERATOR CLASS to make
      one by default.  I owe some more documentation work, too.  But that can all
      be done in smaller pieces once this infrastructure is in place.
      a78fcfb5
  37. 04 10月, 2006 1 次提交
  38. 04 7月, 2006 1 次提交
    • T
      Code review for FILLFACTOR patch. Change WITH grammar as per earlier · b7b78d24
      Tom Lane 提交于
      discussion (including making def_arg allow reserved words), add missed
      opt_definition for UNIQUE case.  Put the reloptions support code in a less
      random place (I chose to make a new file access/common/reloptions.c).
      Eliminate header inclusion creep.  Make the index options functions safely
      user-callable (seems like client apps might like to be able to test validity
      of options before trying to make an index).  Reduce overhead for normal case
      with no options by allowing rd_options to be NULL.  Fix some unmaintainably
      klugy code, including getting rid of Natts_pg_class_fixed at long last.
      Some stylistic cleanup too, and pay attention to keeping comments in sync
      with code.
      
      Documentation still needs work, though I did fix the omissions in
      catalogs.sgml and indexam.sgml.
      b7b78d24
  39. 02 7月, 2006 1 次提交
  40. 26 4月, 2006 1 次提交
    • T
      Arrange to cache btree metapage data in the relcache entry for the index, · d2896a9e
      Tom Lane 提交于
      thereby saving a visit to the metapage in most index searches/updates.
      This wouldn't actually save any I/O (since in the old regime the metapage
      generally stayed in cache anyway), but it does provide a useful decrease
      in bufmgr traffic in high-contention scenarios.  Per my recent proposal.
      d2896a9e