1. 15 4月, 2010 1 次提交
    • T
      Fix a problem introduced by my patch of 2010-01-12 that revised the way · 32616fb1
      Tom Lane 提交于
      relcache reload works.  In the patched code, a relcache entry in process of
      being rebuilt doesn't get unhooked from the relcache hash table; which means
      that if a cache flush occurs due to sinval queue overrun while we're
      rebuilding it, the entry could get blown away by RelationCacheInvalidate,
      resulting in crash or misbehavior.  Fix by ensuring that an entry being
      rebuilt has positive refcount, so it won't be seen as a target for removal
      if a cache flush occurs.  (This will mean that the entry gets rebuilt twice
      in such a scenario, but that's okay.)  It appears that the problem can only
      arise within a transaction that has previously reassigned the relfilenode of
      a pre-existing table, via TRUNCATE or a similar operation.  Per bug #5412
      from Rusty Conover.
      
      Back-patch to 8.2, same as the patch that introduced the problem.
      I think that the failure can't actually occur in 8.2, since it lacks the
      rd_newRelfilenodeSubid optimization, but let's make it work like the later
      branches anyway.
      
      Patch by Heikki, slightly editorialized on by me.
      32616fb1
  2. 14 1月, 2010 1 次提交
    • T
      When loading critical system indexes into the relcache, ensure we lock the · 8a6a40de
      Tom Lane 提交于
      underlying catalog not only the index itself.  Otherwise, if the cache
      load process touches the catalog (which will happen for many though not
      all of these indexes), we are locking index before parent table, which can
      result in a deadlock against processes that are trying to lock them in the
      normal order.  Per today's failure on buildfarm member gothic_moth; it's
      surprising the problem hadn't been identified before.
      
      Back-patch to 8.2.  Earlier releases didn't have the issue because they
      didn't try to lock these indexes during load (instead assuming that they
      couldn't change schema at all during multiuser operation).
      8a6a40de
  3. 13 1月, 2010 1 次提交
    • T
      Fix relcache reload mechanism to be more robust in the face of errors · d4b7cf06
      Tom Lane 提交于
      occurring during a reload, such as query-cancel.  Instead of zeroing out
      an existing relcache entry and rebuilding it in place, build a new relcache
      entry, then swap its contents with the old one, then free the new entry.
      This avoids problems with code believing that a previously obtained pointer
      to a cache entry must still reference a valid entry, as seen in recent
      failures on buildfarm member jaguar.  (jaguar is using CLOBBER_CACHE_ALWAYS
      which raises the probability of failure substantially, but the problem
      could occur in the field without that.)  The previous design was okay
      when it was made, but subtransactions and the ResourceOwner mechanism
      make it unsafe now.
      
      Also, make more use of the already existing rd_isvalid flag, so that we
      remember that the entry requires rebuilding even if the first attempt fails.
      
      Back-patch as far as 8.2.  Prior versions have enough issues around relcache
      reload anyway (due to inadequate locking) that fixing this one doesn't seem
      worthwhile.
      d4b7cf06
  4. 27 9月, 2009 1 次提交
    • T
      Fix RelationCacheInitializePhase2 (Phase3, in HEAD) to cope with the · 8b720b57
      Tom Lane 提交于
      possibility of shared-inval messages causing a relcache flush while it tries
      to fill in missing data in preloaded relcache entries.  There are actually
      two distinct failure modes here:
      
      1. The flush could delete the next-to-be-processed cache entry, causing
      the subsequent hash_seq_search calls to go off into the weeds.  This is
      the problem reported by Michael Brown, and I believe it also accounts
      for bug #5074.  The simplest fix is to restart the hashtable scan after
      we've read any new data from the catalogs.  It appears that pre-8.4
      branches have not suffered from this failure, because by chance there were
      no other catalogs sharing the same hash chains with the catalogs that
      RelationCacheInitializePhase2 had work to do for.  However that's obviously
      pretty fragile, and it seems possible that derivative versions with
      additional system catalogs might be vulnerable, so I'm back-patching this
      part of the fix anyway.
      
      2. The flush could delete the *current* cache entry, in which case the
      pointer to the newly-loaded data would end up being stored into an
      already-deleted Relation struct.  As long as it was still deleted, the only
      consequence would be some leaked space in CacheMemoryContext.  But it seems
      possible that the Relation struct could already have been recycled, in
      which case this represents a hard-to-reproduce clobber of cached data
      structures, with unforeseeable consequences.  The fix here is to pin the
      entry while we work on it.
      
      In passing, also change RelationCacheInitializePhase2 to Assert that
      formrdesc() set up the relation's cached TupleDesc (rd_att) with the
      correct type OID and hasoids values.  This is more appropriate than
      silently updating the values, because the original tupdesc might already
      have been copied into the catcache.  However this part of the patch is
      not in HEAD because it fails due to some questionable recent changes in
      formrdesc :-(.  That will be cleaned up in a subsequent patch.
      8b720b57
  5. 30 12月, 2008 1 次提交
  6. 11 8月, 2008 1 次提交
    • T
      Fix corner-case bug introduced with HOT: if REINDEX TABLE pg_class (or a · e9ec4bbf
      Tom Lane 提交于
      REINDEX DATABASE including same) is done before a session has done any other
      update on pg_class, the pg_class relcache entry was left with an incorrect
      setting of rd_indexattr, because the indexed-attributes set would be first
      demanded at a time when we'd forced a partial list of indexes into the
      pg_class entry, and it would remain cached after that.  This could result
      in incorrect decisions about HOT-update safety later in the same session.
      In practice, since only pg_class_relname_nsp_index would be missed out,
      only ALTER TABLE RENAME and ALTER TABLE SET SCHEMA could trigger a problem.
      Per report and test case from Ondrej Jirman.
      e9ec4bbf
  7. 17 4月, 2008 1 次提交
    • T
      Fix LOAD_CRIT_INDEX() macro to take out AccessShareLock on the system index · 95b7a876
      Tom Lane 提交于
      it is trying to build a relcache entry for.  This is an oversight in my 8.2
      patch that tried to ensure we always took a lock on a relation before trying
      to build its relcache entry.  The implication is that if someone committed a
      reindex of a critical system index at about the same time that some other
      backend were starting up without a valid pg_internal.init file, the second one
      might PANIC due to not seeing any valid version of the index's pg_class row.
      Improbable case, but definitely not impossible.
      95b7a876
  8. 01 4月, 2008 1 次提交
    • T
      Fix an oversight I made in a cleanup patch over a year ago: · e3a47483
      Tom Lane 提交于
      eval_const_expressions needs to be passed the PlannerInfo ("root") structure,
      because in some cases we want it to substitute values for Param nodes.
      (So "constant" is not so constant as all that ...)  This mistake partially
      disabled optimization of unnamed extended-Query statements in 8.3: in
      particular the LIKE-to-indexscan optimization would never be applied if the
      LIKE pattern was passed as a parameter, and constraint exclusion depending
      on a parameter value didn't work either.
      e3a47483
  9. 28 2月, 2008 1 次提交
  10. 02 1月, 2008 1 次提交
  11. 29 11月, 2007 1 次提交
    • T
      Improve test coverage of CLOBBER_CACHE_ALWAYS by having it also force · 03ffc4d6
      Tom Lane 提交于
      reloading of operator class information on each use of LookupOpclassInfo.
      Had this been in place a year ago, it would have helped me find a bug
      in the then-new 'operator family' code.  Now that we have a build farm
      member testing CLOBBER_CACHE_ALWAYS on a regular basis, it seems worth
      expending a little bit of effort here.
      03ffc4d6
  12. 16 11月, 2007 1 次提交
  13. 21 9月, 2007 1 次提交
    • T
      HOT updates. When we update a tuple without changing any of its indexed · 282d2a03
      Tom Lane 提交于
      columns, and the new version can be stored on the same heap page, we no longer
      generate extra index entries for the new version.  Instead, index searches
      follow the HOT-chain links to ensure they find the correct tuple version.
      
      In addition, this patch introduces the ability to "prune" dead tuples on a
      per-page basis, without having to do a complete VACUUM pass to recover space.
      VACUUM is still needed to clean up dead index entries, however.
      
      Pavan Deolasee, with help from a bunch of other people.
      282d2a03
  14. 26 7月, 2007 1 次提交
    • T
      Arrange to put TOAST tables belonging to temporary tables into special schemas · 82eed4db
      Tom Lane 提交于
      named pg_toast_temp_nnn, alongside the pg_temp_nnn schemas used for the temp
      tables themselves.  This allows low-level code such as the relcache to
      recognize that these tables are indeed temporary, which enables various
      optimizations such as not WAL-logging changes and using local rather than
      shared buffers for access.  Aside from obvious performance benefits, this
      provides a solution to bug #3483, in which other backends unexpectedly held
      open file references to temporary tables.  The scheme preserves the property
      that TOAST tables are not in any schema that's normally in the search path,
      so they don't conflict with user table names.
      
      initdb forced because of changes in system view definitions.
      82eed4db
  15. 27 5月, 2007 1 次提交
    • T
      Fix up pgstats counting of live and dead tuples to recognize that committed · 77947c51
      Tom Lane 提交于
      and aborted transactions have different effects; also teach it not to assume
      that prepared transactions are always committed.
      
      Along the way, simplify the pgstats API by tying counting directly to
      Relations; I cannot detect any redeeming social value in having stats
      pointers in HeapScanDesc and IndexScanDesc structures.  And fix a few
      corner cases in which counts might be missed because the relation's
      pgstat_info pointer hadn't been set.
      77947c51
  16. 03 5月, 2007 1 次提交
    • T
      Fix things so that when CREATE INDEX CONCURRENTLY sets pg_index.indisvalid · 8ec94385
      Tom Lane 提交于
      true at the very end of its processing, the update is broadcast via a
      shared-cache-inval message for the index; without this, existing backends that
      already have relcache entries for the index might never see it become valid.
      Also, force a relcache inval on the index's parent table at the same time,
      so that any cached plans for that table are re-planned; this ensures that
      the newly valid index will be used if appropriate.  Aside from making
      C.I.C. behave more reasonably, this is necessary infrastructure for some
      aspects of the HOT patch.  Pavan Deolasee, with a little further stuff from
      me.
      8ec94385
  17. 29 3月, 2007 1 次提交
  18. 20 3月, 2007 1 次提交
    • J
      Changes pg_trigger and extend pg_rewrite in order to allow triggers and · 0fe16500
      Jan Wieck 提交于
      rules to be defined with different, per session controllable, behaviors
      for replication purposes.
      
      This will allow replication systems like Slony-I and, as has been stated
      on pgsql-hackers, other products to control the firing mechanism of
      triggers and rewrite rules without modifying the system catalog directly.
      
      The firing mechanisms are controlled by a new superuser-only GUC
      variable, session_replication_role, together with a change to
      pg_trigger.tgenabled and a new column pg_rewrite.ev_enabled. Both
      columns are a single char data type now (tgenabled was a bool before).
      The possible values in these attributes are:
      
           'O' - Trigger/Rule fires when session_replication_role is "origin"
                 (default) or "local". This is the default behavior.
      
           'D' - Trigger/Rule is disabled and fires never
      
           'A' - Trigger/Rule fires always regardless of the setting of
                 session_replication_role
      
           'R' - Trigger/Rule fires when session_replication_role is "replica"
      
      The GUC variable can only be changed as long as the system does not have
      any cached query plans. This will prevent changing the session role and
      accidentally executing stored procedures or functions that have plans
      cached that expand to the wrong query set due to differences in the rule
      firing semantics.
      
      The SQL syntax for changing a triggers/rules firing semantics is
      
           ALTER TABLE <tabname> <when> TRIGGER|RULE <name>;
      
           <when> ::= ENABLE | ENABLE ALWAYS | ENABLE REPLICA | DISABLE
      
      psql's \d command as well as pg_dump are extended in a backward
      compatible fashion.
      
      Jan
      0fe16500
  19. 04 3月, 2007 1 次提交
  20. 28 2月, 2007 1 次提交
    • T
      Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len). · 234a02b2
      Tom Lane 提交于
      Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
      VARSIZE and VARDATA, and as a consequence almost no code was using the
      longer names.  Rename the length fields of struct varlena and various
      derived structures to catch anyplace that was accessing them directly;
      and clean up various places so caught.  In itself this patch doesn't
      change any behavior at all, but it is necessary infrastructure if we hope
      to play any games with the representation of varlena headers.
      Greg Stark and Tom Lane
      234a02b2
  21. 25 1月, 2007 1 次提交
  22. 09 1月, 2007 1 次提交
    • T
      Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST · 44317582
      Tom Lane 提交于
      per-column options for btree indexes.  The planner's support for this is still
      pretty rudimentary; it does not yet know how to plan mergejoins with
      nondefault ordering options.  The documentation is pretty rudimentary, too.
      I'll work on improving that stuff later.
      
      Note incompatible change from prior behavior: ORDER BY ... USING will now be
      rejected if the operator is not a less-than or greater-than member of some
      btree opclass.  This prevents less-than-sane behavior if an operator that
      doesn't actually define a proper sort ordering is selected.
      44317582
  23. 06 1月, 2007 1 次提交
  24. 01 1月, 2007 1 次提交
    • T
      Found the problem with my operator-family changes: by fetching from · 0b56be83
      Tom Lane 提交于
      pg_opclass during LookupOpclassInfo(), I'd turned pg_opclass_oid_index
      into a critical system index.  However the problem could only manifest
      during a backend's first attempt to load opclass data, and then only
      if it had successfully loaded pg_internal.init and subsequently received
      a relcache flush; which made it impossible to reproduce in sequential
      tests and darn hard even in parallel tests.  Memo to self: when
      exercising cache flush scenarios, must disable LookupOpclassInfo's
      internal cache too.
      0b56be83
  25. 23 12月, 2006 1 次提交
    • T
      Restructure operator classes to allow improved handling of cross-data-type · a78fcfb5
      Tom Lane 提交于
      cases.  Operator classes now exist within "operator families".  While most
      families are equivalent to a single class, related classes can be grouped
      into one family to represent the fact that they are semantically compatible.
      Cross-type operators are now naturally adjunct parts of a family, without
      having to wedge them into a particular opclass as we had done originally.
      
      This commit restructures the catalogs and cleans up enough of the fallout so
      that everything still works at least as well as before, but most of the work
      needed to actually improve the planner's behavior will come later.  Also,
      there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way
      to create a new family right now is to allow CREATE OPERATOR CLASS to make
      one by default.  I owe some more documentation work, too.  But that can all
      be done in smaller pieces once this infrastructure is in place.
      a78fcfb5
  26. 06 11月, 2006 1 次提交
    • T
      Fix recently-identified PITR recovery hazard: the base backup could contain · 76d5667b
      Tom Lane 提交于
      stale relcache init files (pg_internal.init), and there is no mechanism for
      updating them during WAL replay.  Easiest solution is just to delete the init
      files at conclusion of startup, and let the first backend started in each
      database take care of rebuilding the init file.  Simon Riggs and Tom Lane.
      
      Back-patched to 8.1.  Arguably this should be fixed in 8.0 too, but it would
      require significantly more code since 8.0 has no handy startup-time scan of
      pg_database to piggyback on.  Manual solution of the problem is possible
      in 8.0 (just delete the pg_internal.init files before starting WAL replay),
      so that may be a sufficient answer.
      76d5667b
  27. 04 10月, 2006 1 次提交
  28. 06 9月, 2006 1 次提交
    • T
      Get rid of the separate RULE privilege for tables: now only a table's owner · 7bae5a28
      Tom Lane 提交于
      can create or modify rules for the table.  Do setRuleCheckAsUser() while
      loading rules into the relcache, rather than when defining a rule.  This
      ensures that permission checks for tables referenced in a rule are done with
      respect to the current owner of the rule's table, whereas formerly ALTER TABLE
      OWNER would fail to update the permission checking for associated rules.
      Removal of separate RULE privilege is needed to prevent various scenarios
      in which a grantee of RULE privilege could effectively have any privilege
      of the table owner.  For backwards compatibility, GRANT/REVOKE RULE is still
      accepted, but it doesn't do anything.  Per discussion here:
      http://archives.postgresql.org/pgsql-hackers/2006-04/msg01138.php
      7bae5a28
  29. 01 8月, 2006 1 次提交
    • T
      Change the relation_open protocol so that we obtain lock on a relation · 09d3670d
      Tom Lane 提交于
      (table or index) before trying to open its relcache entry.  This fixes
      race conditions in which someone else commits a change to the relation's
      catalog entries while we are in process of doing relcache load.  Problems
      of that ilk have been reported sporadically for years, but it was not
      really practical to fix until recently --- for instance, the recent
      addition of WAL-log support for in-place updates helped.
      
      Along the way, remove pg_am.amconcurrent: all AMs are now expected to support
      concurrent update.
      09d3670d
  30. 14 7月, 2006 2 次提交
  31. 04 7月, 2006 1 次提交
    • T
      Code review for FILLFACTOR patch. Change WITH grammar as per earlier · b7b78d24
      Tom Lane 提交于
      discussion (including making def_arg allow reserved words), add missed
      opt_definition for UNIQUE case.  Put the reloptions support code in a less
      random place (I chose to make a new file access/common/reloptions.c).
      Eliminate header inclusion creep.  Make the index options functions safely
      user-callable (seems like client apps might like to be able to test validity
      of options before trying to make an index).  Reduce overhead for normal case
      with no options by allowing rd_options to be NULL.  Fix some unmaintainably
      klugy code, including getting rid of Natts_pg_class_fixed at long last.
      Some stylistic cleanup too, and pay attention to keeping comments in sync
      with code.
      
      Documentation still needs work, though I did fix the omissions in
      catalogs.sgml and indexam.sgml.
      b7b78d24
  32. 02 7月, 2006 1 次提交
  33. 17 6月, 2006 1 次提交
    • T
      Fix problems with cached tuple descriptors disappearing while still in use · 06e10abc
      Tom Lane 提交于
      by creating a reference-count mechanism, similar to what we did a long time
      ago for catcache entries.  The back branches have an ugly solution involving
      lots of extra copies, but this way is more efficient.  Reference counting is
      only applied to tupdescs that are actually in caches --- there seems no need
      to use it for tupdescs that are generated in the executor, since they'll go
      away during plan shutdown by virtue of being in the per-query memory context.
      Neil Conway and Tom Lane
      06e10abc
  34. 06 5月, 2006 1 次提交
  35. 05 5月, 2006 1 次提交
  36. 26 4月, 2006 1 次提交
    • T
      Arrange to cache btree metapage data in the relcache entry for the index, · d2896a9e
      Tom Lane 提交于
      thereby saving a visit to the metapage in most index searches/updates.
      This wouldn't actually save any I/O (since in the old regime the metapage
      generally stayed in cache anyway), but it does provide a useful decrease
      in bufmgr traffic in high-contention scenarios.  Per my recent proposal.
      d2896a9e
  37. 05 3月, 2006 1 次提交
  38. 20 1月, 2006 1 次提交
    • T
      Avoid crashing if relcache flush occurs while trying to load data into an · ed69cf5d
      Tom Lane 提交于
      index's support-function cache (in index_getprocinfo).  Since none of that
      data can change for an index that's in active use, it seems sufficient to
      treat all open indexes the same way we were treating "nailed" system indexes
      --- that is, just re-read the pg_class row and leave the rest of the relcache
      entry strictly alone.  The pg_class re-read might not be strictly necessary
      either, but since the reltablespace and relfilenode can change in normal
      operation it seems safest to do it.  (We don't support changing any of the
      other info about an index at all, at the moment.)
      
      Back-patch as far as 8.0.  It might be possible to adapt the patch to 7.4,
      but it would take more work than I care to expend for such a low-probability
      problem.  7.3 is out of luck for sure.
      ed69cf5d
  39. 19 1月, 2006 1 次提交