1. 17 1月, 2018 1 次提交
  2. 13 1月, 2018 2 次提交
    • H
      Remove mirrored flatfile stuff. · 7a249d7e
      Heikki Linnakangas 提交于
      Revert the code to open/read/write regular files, to the way it's in the
      upstream.
      7a249d7e
    • H
      Remove a lot of persistent table and mirroring stuff. · 5c158ff3
      Heikki Linnakangas 提交于
      * Revert almost all the changes in smgr.c / md.c, to not go through
        the Mirrored* APIs.
      
      * Remove mmxlog stuff. Use upstream "pending relation deletion" code
        instead.
      
      * Get rid of multiple startup passes. Now it's just a single pass like
        in the upstream.
      
      * Revert the way database drop/create are handled to the way it is in
        upstream. Doesn't use PT anymore, but accesses file system directly,
        and WAL-logs a single CREATE/DROP DATABASE WAL record.
      
      * Get rid of MirroredLock
      
      * Remove a few tests that were specific to persistent tables.
      
      * Plus a lot of little removals and reverts to upstream code.
      5c158ff3
  3. 01 9月, 2017 1 次提交
  4. 24 8月, 2017 1 次提交
  5. 29 6月, 2017 1 次提交
    • H
      Change the way OIDs are preserved during pg_upgrade. · f51f2f57
      Heikki Linnakangas 提交于
      Instead of meticulously recording the OIDs of each object in the pg_dump
      output, dump and load all OIDs as a separate steps in pg_upgrade.
      
      We now only preserve OIDs of types, relations and schemas from the old
      cluster. Other objects are assigned new OIDs as part of the restore.
      To ensure the OIDs are consistent between the QD and QEs, we dump the
      (new) OIDs of all objects to a file, after upgrading the QD node, and use
      those OIDs when restoring the QE nodes. We were already using a similar
      mechanism for new array types, but we now do that for all objects.
      f51f2f57
  6. 05 4月, 2017 1 次提交
    • J
      Decouple OID and relfilenode allocations with new relfilenode counter · 1fd11387
      Jimmy Yih 提交于
      The master allocates an OID and provides it to segments during
      dispatch. The segments then check if they can use this OID as its
      relfilenode. If a segment cannot use the preassigned OID as the
      relation's relfilenode, it will generate a new relfilenode value via
      nextOid counter. This can result in a race condition between the
      generation of the new OID and the segment file being created on disk
      after being added to persistent tables. To combat this race condition,
      we have a small OID cache... but we have found in testing that it was
      not enough to prevent the issue.
      
      To fully solve the issue, we decouple OID and relfilenode on both QD and QE
      segments by introducing a nextRelfilenode counter which is similar to
      the nextOid counter. The QD segment will generate the OIDs and its own
      relfilenodes. The QE segments only use the preassigned OIDs from the QD
      dispatch and generate a relfilenode value from their own nextRelfilenode
      counter.
      
      Current sequence generation is always done on QD sequence server, and
      assumes the OID is always same as relfilenode when handling sequence client
      requests from QE segments. It is hard to change this assumption so we have a
      special OID/relfilenode sync for sequence relations for GP_ROLE_DISPATCH and
      GP_ROLE_UTILITY.
      
      Reference gpdb-dev thread:
      https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/lv6Sb4I6iSISigned-off-by: NXin Zhang <xzhang@pivotal.io>
      1fd11387
  7. 28 2月, 2017 3 次提交
    • A
      Do not initialize relcache before xlog replay. · 6b9d44c9
      Ashwin Agrawal, Asim R P and Xin Zhang 提交于
      The need for heap access methods before xlog replay is removed by commit
      e2d6aa1481f6cdbd846d4b17b68eb4387dae9211. This commit simply moves the
      relcache initialization to pass4, where it is really needed.
      
      Do not bother to remove relcache init file at the end of crash recovery pass2.
      
      Error out if relation cache initialized at wrong time.
      6b9d44c9
    • D
      Remove unused debugging timeout in relcache · 602bfa8f
      Daniel Gustafsson 提交于
      Debug_gp_relation_node_fetch_wait_for_debugging was added to assist
      debugging an issue in the RelationFetchGpRelationNodeForXLog recursion
      guard. The bug was not reproduced but the GUC was left in when the
      ticket was closed so remove. If we need to debug this in the future
      we should add something back then.
      602bfa8f
    • J
      Use tablespace oid as part of the dbInfoRel hash table key · a0db527d
      Jimmy Yih 提交于
      The current dbInfoRel hash table key only contains the relfilenode
      oid. However, relfilenode oids can be duplicated over different
      tablespaces which can cause dropdb (and possibly persistent rebuild)
      to fail. This commit adds the tablespace oid as part of the dbInfoRel
      hash table key for more uniqueness.
      
      One thing to note is that a constructed tablespace/relfilenode key is
      compared with other keys using memcmp. Supposedly... this should be
      fine since the struct just contains two OID variables and the keys are
      always palloc0'd. The alignment should be fine during comparison.
      a0db527d
  8. 24 2月, 2017 1 次提交
  9. 23 2月, 2017 1 次提交
    • A
      Do not initialize relcache before xlog replay. · f8f6587e
      Ashwin Agrawal, Asim R P and Xin Zhang 提交于
      The need for heap access methods before xlog replay is removed by commit
      e2d6aa1481f6cdbd846d4b17b68eb4387dae9211.  This commit simply moves the
      relcache initialization to pass4, where it is really needed.
      
      Do not bother to remove relcache init file at the end of crash recovery pass2.
      
      Error out if relation cache initialized at wrong time.
      f8f6587e
  10. 21 12月, 2016 1 次提交
    • A
      Add tablespace OID to gp_relation_node and its index. · 8fe321af
      Ashwin Agrawal 提交于
      Relfilenode is only unique within a tablespace. Across tablespaces same
      relfilenode may be allocated within a database. Currently, gp_relation_node only
      stores relfilenode and segment_num and has unique index using only those fields
      without tablespace. So, it breaks for situations where same relfilenode gets
      allocated to table within database.
      8fe321af
  11. 09 11月, 2016 1 次提交
    • H
      Fix relfilenode conflicts. · 88f0623e
      Heikki Linnakangas 提交于
      There was a race condition in the way relfilenodes were chosen, because
      QE nodes chose relfilenodes for existing relations, e.g. at REINDEX or
      TRUNCATE, independently of the master, while for newly created tables,
      the relfilenode was chosen by the master. To fix:
      
      1. If the OID of a newly-created table is already in use as relfilenode
         of a different table in a QE segment, use a different relfilenode.
         (This shouldn't happen in the dispatcher, for the same reasons it
         cannot happen in a single-server PostgreSQL instance)
      
      2. Use a small cache of values recently used for a relfilenode, to close
         a race condition between checking if a relfilenode is in use, and
         actually creating the file
      88f0623e
  12. 01 11月, 2016 1 次提交
    • H
      Fix gcc warnings on misleading indentation. · cdfe1917
      Heikki Linnakangas 提交于
      In many places where we had used a mixture of spaces and tabs for
      indentation, new versions of gcc complained about misleading indentation,
      because gcc doesn't know we're using tab width of 4. To fix, make the
      indentation consistent in all the places where gcc gave a warning. Would
      be nice to fix it all around, but that's a lot of work, so let's do it
      in a piecemeal fashion whenever we run into issues or need to modify a
      piece of code anyway.
      
      For some files, especially the GPDB-specific ones, I ran pgindent over
      the whole file. I used the pgindent from PostgreSQL master, which is
      slightly different from what was used back 8.3 days, but that's what I had
      easily available, and that's what we're heading to in the future anyway.
      In some cases, I didn't commit the pgindented result if there were
      funnily formatted code or comments that would need special treatment.
      
      For other places, I fixed the indentation locally, just enough to make the
      warnings go away.
      
      I also did a tiny bit of other trivial cleanup, that I happened to spot
      while working on this, although I tried to refrain from anything more
      extensive.
      cdfe1917
  13. 19 10月, 2016 1 次提交
    • A
      Fix an oversight in merge from upstream. · bdc15033
      Amil Khanzada 提交于
      Relcache init file was failing to load because of duplicate occurrence of
      OpclassOidIndexId in the nailed indexes list.  Consequently, relcache was
      always built from scratch upon every child process startup.  Also added
      OperatorOidIndex from upstream to the nailed index list, to match upstream
      (8.3), as pointed out by Heikki.
      Signed-off-by: NAsim R P <apraveen@pivotal.io>
      bdc15033
  14. 23 9月, 2016 2 次提交
    • A
      Invalidate persistent TID during relcache invalidation · 90036021
      Asim R P 提交于
      DDL statements, such as reindex, change relfilenode of a relcache
      entry.  This triggers invalidation but we were keeping the old
      persistent TID upon invalidation.  Change this so that persistent TID
      is also reset upon invalidation, if relfilenode of the relcache entry
      has changed.
      90036021
    • S
      Validate persistent information in relcache entry. · fe39e7eb
      Shoaib Lari 提交于
      We have observed that the persistent TID and serial number associated with a
      relfilenode in a relation descriptor may be updated to a new value before
      relfilenode is updated.  If an XLog record is written with such a relation
      descriptor, then the XLog record fails to apply during recovery.
      
      This commit adds a check to validate sanity of persistent information in a
      relation descriptor by fetching persistent information using relfilenode from
      gp_relation_node.
      
      The validation is controlled by a new GUC, gp_validate_pt_info_relcache.
      fe39e7eb
  15. 01 9月, 2016 1 次提交
    • A
      Check for zero persistentTID in xlog record · 9b2eb416
      Ashwin Agrawal 提交于
      Refactor code to use common routine to fetch PT info for xlogging. Check can
      be easliy added at this common place to validate persistent info is
      available. Plus still add check during recovery for persistentTID zero. As
      with postgres upstream merges possible the function to populate persistent info is
      not called at all, so this check will not hit during xlog record construction
      but atleast gives clear clue during recovery.
      9b2eb416
  16. 16 8月, 2016 1 次提交
  17. 02 8月, 2016 1 次提交
    • H
      Store pg_appendonly tuple in relcache. · a4fbb150
      Heikki Linnakangas 提交于
      This way, you don't need to always fetch it from the system catalogs,
      which makes things simpler, and is marginally faster too.
      
      To make all the fields in pg_appendonly accessible by direct access to
      the Form_pg_appendonly struct, change 'compresstype' field from text to
      name. "No compression" is now represented by an empty string, rather than
      NULL. I hope there are no applications out there that will get confused
      by this.
      
      The GetAppendOnlyEntry() function used to take a Snapshot as argument,
      but that seems unnecessary. The data in pg_appendonly doesn't change for
      a table after it's been created. Except when it's ALTERed, or rewritten
      by TRUNCATE or CLUSTER, but those operations invalidate the relcache, and
      we're never interested in the old version.
      
      There's not much need for the AppendOnlyEntry struct and the
      GetAppendOnlyEntry() function anymore; you can just as easily just access
      the Form_pg_appendonly struct directly. I'll remove that as a separate
      commit, though, to keep this one more readable.
      a4fbb150
  18. 16 6月, 2016 1 次提交
    • H
      Revert relcache invalidation at EOX to work like before the merge. · 3b8a4fe2
      Heikki Linnakangas 提交于
      The other FIXME comment that the removed comment refers to was reverted
      earlier, before we pushed the 8.3 merge. Should've uncommented this back
      then, but I missed the need for that because I've been building with
      --enable-cassert. This fixes the regression failure on gpdtm_plpgsql test
      case, when built without assertions.
      3b8a4fe2
  19. 28 5月, 2016 1 次提交
    • H
      Remove Debug_check_for_invalid_persistent_tid option. · b3ec6f00
      Heikki Linnakangas 提交于
      The checks for invalid TIDs are very cheap, a few CPU instructions. It's
      better to catch bugs involving invalid TID early, so let's always check
      for them.
      
      The LOGs in storeAttrDefault() that were also tied to this GUC seemed
      oddly specific. They were probably added long time ago to hunt for some
      particular bug, and don't seem generally useful, so I just removed them.
      b3ec6f00
  20. 06 5月, 2016 1 次提交
  21. 19 3月, 2016 2 次提交
    • A
      Validate pg_class pg_type tuple after index fetch. · 0ff952e7
      Ashwin Agrawal 提交于
      While fetching the pg_class or pg_type tuple using index, perform sanity
      check to make sure tuple intended to read, is the tuple index is
      pointing too. This is just sanity check to make sure index is not broken
      and not returning some incorrect tuple to contain the damage.
      0ff952e7
    • A
      Validate gp_relation_node tuple after index fetch. · c8a21e0d
      Ashwin Agrawal 提交于
      This commit makes sure while accessing gp_relation_node through its
      index, sanity check is always performed to verify the tuple being
      operated on is the intended tuple, else for any reason index is broken
      and provide bad tuple fail the operation instead of causing damage.
      For some scenarios like delete gp_relation_code tuple case it adds extra tuple
      deform call which was not done earlier but doesn't seem heavy enough to
      be performed on ddl operation.
      c8a21e0d
  22. 12 2月, 2016 1 次提交
    • H
      Remove unnecessary #includes. · 9aa7a22f
      Heikki Linnakangas 提交于
      In cdbcat.h, include only the header files that are actually needed for
      the single function prototype in that file. And don't include cdbcat.h
      unnecessarily. A couple of .c files were including cdbcat.h to get
      GpPolicy, but that's actually defined in catalog/gp_policy.h, so #include
      that directly instead where needed.
      9aa7a22f
  23. 12 1月, 2016 1 次提交
    • H
      Make functions in gp_toolkit to execute on all nodes as intended. · 246f7510
      Heikki Linnakangas 提交于
      Moving the installation of gp_toolkit.sql into initdb, in commit f8910c3c,
      broke all the functions that are supposed to execute on all nodes, like
      gp_toolkit.__gp_localid. After that change, gp_toolkit.sql was executed
      in utility mode, and the gp_distribution_policy entries for those functions
      were not created as a result.
      
      To fix, change the code so that gp_distribution_policy entries are never
      never created, or consulted, for EXECUTE-type external tables. They have
      more fine-grained information in pg_exttable.location field anyway, so rely
      on that instead. With this change, there is no difference in whether an
      EXECUTE-type external table is created in utility mode or not. We would
      still have similar problems if gp_toolkit contained other kinds of
      external tables, but it doesn't.
      
      This removes the isMasterOnly() function and changes all its callers to
      call GpPolicyFetch() directly instead. Some places used GpPolicyFetch()
      directly to check if a table is distributed, so this just makes that the
      canonical way to do it. The check for system schemas that used to be in
      isMasterOnly() are no longer performed, but they should've unnecessary in
      the first place. System tables don't have gp_distribution_policy entries,
      so they'll be treated as master-only even without that check.
      246f7510
  24. 19 11月, 2015 1 次提交
    • H
      Remove gp_upgrade_mode and related machinery. · d9b60cd2
      Heikki Linnakangas 提交于
      The current plan is to use something like pg_upgrade for future in-place
      upgrades. The gpupgrade mechanism will not scale to the kind of drastic
      catalog and other data directory layout changes that are coming as we
      merge with later PostgreSQL releases.
      
      Kept gpupgrademirror for now. Need to check if there's some logic that's
      worth saving, for a possible pg_upgrade based solution later.
      d9b60cd2
  25. 28 10月, 2015 1 次提交
  26. 30 11月, 2012 1 次提交
    • T
      Fix assorted bugs in CREATE INDEX CONCURRENTLY. · 5c8c7c7c
      Tom Lane 提交于
      This patch changes CREATE INDEX CONCURRENTLY so that the pg_index
      flag changes it makes without exclusive lock on the index are made via
      heap_inplace_update() rather than a normal transactional update.  The
      latter is not very safe because moving the pg_index tuple could result in
      concurrent SnapshotNow scans finding it twice or not at all, thus possibly
      resulting in index corruption.
      
      In addition, fix various places in the code that ought to check to make
      sure that the indexes they are manipulating are valid and/or ready as
      appropriate.  These represent bugs that have existed since 8.2, since
      a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
      index behind, and we ought not try to do anything that might fail with
      such an index.
      
      Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
      columns that are allowed to change after initial creation.  Previously we
      could have been left with stale values of some fields in an index relcache
      entry.  It's not clear whether this actually had any user-visible
      consequences, but it's at least a bug waiting to happen.
      
      This is a subset of a patch already applied in 9.2 and HEAD.  Back-patch
      into all earlier supported branches.
      
      Tom Lane and Andres Freund
      5c8c7c7c
  27. 17 8月, 2011 1 次提交
    • T
      Fix race condition in relcache init file invalidation. · 8407a11c
      Tom Lane 提交于
      The previous code tried to synchronize by unlinking the init file twice,
      but that doesn't actually work: it leaves a window wherein a third process
      could read the already-stale init file but miss the SI messages that would
      tell it the data is stale.  The result would be bizarre failures in catalog
      accesses, typically "could not read block 0 in file ..." later during
      startup.
      
      Instead, hold RelCacheInitLock across both the unlink and the sending of
      the SI messages.  This is more straightforward, and might even be a bit
      faster since only one unlink call is needed.
      
      This has been wrong since it was put in (in 2002!), so back-patch to all
      supported releases.
      8407a11c
  28. 23 3月, 2011 1 次提交
    • T
      Avoid potential deadlock in InitCatCachePhase2(). · cf735470
      Tom Lane 提交于
      Opening a catcache's index could require reading from that cache's own
      catalog, which of course would acquire AccessShareLock on the catalog.
      So the original coding here risks locking index before heap, which could
      deadlock against another backend trying to get exclusive locks in the
      normal order.  Because InitCatCachePhase2 is only called when a backend
      has to start up without a relcache init file, the deadlock was seldom seen
      in the field.  (And by the same token, there's no need to worry about any
      performance disadvantage; so not much point in trying to distinguish
      exactly which catalogs have the risk.)
      
      Bug report, diagnosis, and patch by Nikhil Sontakke.  Additional commentary
      by me.  Back-patch to all supported branches.
      cf735470
  29. 02 9月, 2010 1 次提交
    • T
      Fix up flushing of composite-type typcache entries to be driven directly by · a15e220a
      Tom Lane 提交于
      SI invalidation events, rather than indirectly through the relcache.
      
      In the previous coding, we had to flush a composite-type typcache entry
      whenever we discarded the corresponding relcache entry.  This caused problems
      at least when testing with RELCACHE_FORCE_RELEASE, as shown in recent report
      from Jeff Davis, and might result in real-world problems given the kind of
      unexpected relcache flush that that test mechanism is intended to model.
      
      The new coding decouples relcache and typcache management, which is a good
      thing anyway from a structural perspective.  The cost is that we have to
      search the typcache linearly to find entries that need to be flushed.  There
      are a couple of ways we could avoid that, but at the moment it's not clear
      it's worth any extra trouble, because the typcache contains very few entries
      in typical operation.
      
      Back-patch to 8.2, the same as some other recent fixes in this general area.
      The patch could be carried back to 8.0 with some additional work, but given
      that it's only hypothetical whether we're fixing any problem observable in
      the field, it doesn't seem worth the work now.
      a15e220a
  30. 15 4月, 2010 1 次提交
    • T
      Fix a problem introduced by my patch of 2010-01-12 that revised the way · 32616fb1
      Tom Lane 提交于
      relcache reload works.  In the patched code, a relcache entry in process of
      being rebuilt doesn't get unhooked from the relcache hash table; which means
      that if a cache flush occurs due to sinval queue overrun while we're
      rebuilding it, the entry could get blown away by RelationCacheInvalidate,
      resulting in crash or misbehavior.  Fix by ensuring that an entry being
      rebuilt has positive refcount, so it won't be seen as a target for removal
      if a cache flush occurs.  (This will mean that the entry gets rebuilt twice
      in such a scenario, but that's okay.)  It appears that the problem can only
      arise within a transaction that has previously reassigned the relfilenode of
      a pre-existing table, via TRUNCATE or a similar operation.  Per bug #5412
      from Rusty Conover.
      
      Back-patch to 8.2, same as the patch that introduced the problem.
      I think that the failure can't actually occur in 8.2, since it lacks the
      rd_newRelfilenodeSubid optimization, but let's make it work like the later
      branches anyway.
      
      Patch by Heikki, slightly editorialized on by me.
      32616fb1
  31. 14 1月, 2010 1 次提交
    • T
      When loading critical system indexes into the relcache, ensure we lock the · 8a6a40de
      Tom Lane 提交于
      underlying catalog not only the index itself.  Otherwise, if the cache
      load process touches the catalog (which will happen for many though not
      all of these indexes), we are locking index before parent table, which can
      result in a deadlock against processes that are trying to lock them in the
      normal order.  Per today's failure on buildfarm member gothic_moth; it's
      surprising the problem hadn't been identified before.
      
      Back-patch to 8.2.  Earlier releases didn't have the issue because they
      didn't try to lock these indexes during load (instead assuming that they
      couldn't change schema at all during multiuser operation).
      8a6a40de
  32. 13 1月, 2010 1 次提交
    • T
      Fix relcache reload mechanism to be more robust in the face of errors · d4b7cf06
      Tom Lane 提交于
      occurring during a reload, such as query-cancel.  Instead of zeroing out
      an existing relcache entry and rebuilding it in place, build a new relcache
      entry, then swap its contents with the old one, then free the new entry.
      This avoids problems with code believing that a previously obtained pointer
      to a cache entry must still reference a valid entry, as seen in recent
      failures on buildfarm member jaguar.  (jaguar is using CLOBBER_CACHE_ALWAYS
      which raises the probability of failure substantially, but the problem
      could occur in the field without that.)  The previous design was okay
      when it was made, but subtransactions and the ResourceOwner mechanism
      make it unsafe now.
      
      Also, make more use of the already existing rd_isvalid flag, so that we
      remember that the entry requires rebuilding even if the first attempt fails.
      
      Back-patch as far as 8.2.  Prior versions have enough issues around relcache
      reload anyway (due to inadequate locking) that fixing this one doesn't seem
      worthwhile.
      d4b7cf06
  33. 27 9月, 2009 1 次提交
    • T
      Fix RelationCacheInitializePhase2 (Phase3, in HEAD) to cope with the · 8b720b57
      Tom Lane 提交于
      possibility of shared-inval messages causing a relcache flush while it tries
      to fill in missing data in preloaded relcache entries.  There are actually
      two distinct failure modes here:
      
      1. The flush could delete the next-to-be-processed cache entry, causing
      the subsequent hash_seq_search calls to go off into the weeds.  This is
      the problem reported by Michael Brown, and I believe it also accounts
      for bug #5074.  The simplest fix is to restart the hashtable scan after
      we've read any new data from the catalogs.  It appears that pre-8.4
      branches have not suffered from this failure, because by chance there were
      no other catalogs sharing the same hash chains with the catalogs that
      RelationCacheInitializePhase2 had work to do for.  However that's obviously
      pretty fragile, and it seems possible that derivative versions with
      additional system catalogs might be vulnerable, so I'm back-patching this
      part of the fix anyway.
      
      2. The flush could delete the *current* cache entry, in which case the
      pointer to the newly-loaded data would end up being stored into an
      already-deleted Relation struct.  As long as it was still deleted, the only
      consequence would be some leaked space in CacheMemoryContext.  But it seems
      possible that the Relation struct could already have been recycled, in
      which case this represents a hard-to-reproduce clobber of cached data
      structures, with unforeseeable consequences.  The fix here is to pin the
      entry while we work on it.
      
      In passing, also change RelationCacheInitializePhase2 to Assert that
      formrdesc() set up the relation's cached TupleDesc (rd_att) with the
      correct type OID and hasoids values.  This is more appropriate than
      silently updating the values, because the original tupdesc might already
      have been copied into the catcache.  However this part of the patch is
      not in HEAD because it fails due to some questionable recent changes in
      formrdesc :-(.  That will be cleaned up in a subsequent patch.
      8b720b57
  34. 11 6月, 2009 1 次提交
  35. 01 4月, 2009 1 次提交
    • T
      Modify the relcache to record the temp status of both local and nonlocal · 948d6ec9
      Tom Lane 提交于
      temp relations; this is no more expensive than before, now that we have
      pg_class.relistemp.  Insert tests into bufmgr.c to prevent attempting
      to fetch pages from nonlocal temp relations.  This provides a low-level
      defense against bugs-of-omission allowing temp pages to be loaded into shared
      buffers, as in the contrib/pgstattuple problem reported by Stuart Bishop.
      While at it, tweak a bunch of places to use new relcache tests (instead of
      expensive probes into pg_namespace) to detect local or nonlocal temp tables.
      948d6ec9