1. 18 7月, 2013 5 次提交
    • A
      Move checking an explicit VARIADIC "any" argument into the parser. · d26888bc
      Andrew Dunstan 提交于
      This is more efficient and simpler . It does mean that an untyped NULL
      can no longer be used in such cases, which should be mentioned in
      Release Notes, but doesn't seem a terrible loss. The workaround is to
      cast the NULL to some array type.
      
      Pavel Stehule, reviewed by Jeevan Chalke.
      d26888bc
    • T
      Fix direct access to Relation->rd_indpred. · 405a468b
      Tom Lane 提交于
      Should use RelationGetIndexPredicate(), since rd_indpred is just a cache
      that is not computed until/unless demanded.  Per buildfarm failure on
      CLOBBER_CACHE_ALWAYS animals; diagnosis and fix by Hitoshi Harada.
      405a468b
    • H
      Fix variable names mentioned in comment to match the code. · 107cbc90
      Heikki Linnakangas 提交于
      Also, in another comment, explain why holding an insertion slot is a
      critical section.
      
      Per review by Amit Kapila.
      107cbc90
    • H
      Fix assert failure at end of recovery, broken by XLogInsert scaling patch. · 59c02a36
      Heikki Linnakangas 提交于
      Initialization of the first XLOG buffer at end-of-recovery was broken for
      the case that the last read WAL record ended at a page boundary. Instead of
      trying to copy the last full xlog page to the buffer cache in that case,
      just set shared state so that the next page is initialized when the first
      WAL record after startup is inserted. (that's what we did in earlier
      version, too)
      
      To make the shared state required for that case less surprising, replace the
      XLogCtl->curridx variable, which was the index of the latest initialized
      buffer, with an XLogRecPtr of how far the buffers have been initialized.
      That also allows us to get rid of the XLogRecEndPtrToBufIdx macro.
      
      While we're at it, make a similar change for XLogCtl->Write.curridx, getting
      rid of that variable and calculating the next buffer to write from
      XLogCtl->LogwrtResult instead.
      59c02a36
    • H
      Fix end-of-loop optimization in pglz_find_match() function. · 3f2adace
      Heikki Linnakangas 提交于
      After the recent pglz optimization patch, the next/prev pointers in the
      hash table are never NULL, INVALID_ENTRY_PTR is used to represent invalid
      entries instead. The end-of-loop check in pglz_find_match() function didn't
      get the memo. The result was the same from a correctness point of view, but
      because the NULL-check would never fail, the tiny optimization turned into
      a pessimization.
      
      Reported by Stephen Frost, using Coverity scanner.
      3f2adace
  2. 17 7月, 2013 5 次提交
    • N
      Fix systable_recheck_tuple() for MVCC scan snapshots. · ffcf6545
      Noah Misch 提交于
      Since this function assumed non-MVCC snapshots, it broke when commit
      568d4138 switched its one caller from
      SnapshotNow scans to MVCC-snapshot scans.
      
      Reviewed by Robert Haas, Tom Lane and Andres Freund.
      ffcf6545
    • N
      Implement the FILTER clause for aggregate function calls. · b560ec1b
      Noah Misch 提交于
      This is SQL-standard with a few extensions, namely support for
      subqueries and outer references in clause expressions.
      
      catversion bump due to change in Aggref and WindowFunc.
      
      David Fetter, reviewed by Dean Rasheed.
      b560ec1b
    • N
      Comment on why planagg.c punts "MIN(x ORDER BY y)". · 7a8e9f29
      Noah Misch 提交于
      7a8e9f29
    • K
      Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. · cc1965a9
      Kevin Grittner 提交于
      This allows reads to continue without any blocking while a REFRESH
      runs.  The new data appears atomically as part of transaction
      commit.
      
      Review questioned the Assert that a matview was not a system
      relation.  This will be addressed separately.
      
      Reviewed by Hitoshi Harada, Robert Haas, Andres Freund.
      Merged after review with security patch f3ab5d46.
      cc1965a9
    • R
      Allow background workers to be started dynamically. · 7f7485a0
      Robert Haas 提交于
      There is a new API, RegisterDynamicBackgroundWorker, which allows
      an ordinary user backend to register a new background writer during
      normal running.  This means that it's no longer necessary for all
      background workers to be registered during processing of
      shared_preload_libraries, although the option of registering workers
      at that time remains available.
      
      When a background worker exits and will not be restarted, the
      slot previously used by that background worker is automatically
      released and becomes available for reuse.  Slots used by background
      workers that are configured for automatic restart can't (yet) be
      released without shutting down the system.
      
      This commit adds a new source file, bgworker.c, and moves some
      of the existing control logic for background workers there.
      Previously, there was little enough logic that it made sense to
      keep everything in postmaster.c, but not any more.
      
      This commit also makes the worker_spi contrib module into an
      extension and adds a new function, worker_spi_launch, which can
      be used to demonstrate the new facility.
      7f7485a0
  3. 16 7月, 2013 2 次提交
    • S
      Check get_tle_by_resno() result before deref · 4ed22e89
      Stephen Frost 提交于
      When creating a sort to support a group by, we need to look up the
      target entry in the target list by the resno using get_tle_by_resno().
      This particular code-path didn't check the result prior to attempting
      to dereference it, while all other callers did.  While I can't see a
      way for this usage of get_tle_by_resno() to fail (you can't ask for
      a column to be sorted on which isn't included in the group by), it's
      probably best to check that we didn't end up with a NULL somehow
      anyway than risk the segfault.
      
      I'm willing to back-patch this if others feel it's necessary, but my
      guess is new features are what might tickle this rather than anything
      existing.
      
      Missing check spotted by the Coverity scanner.
      4ed22e89
    • R
      Assert that syscache lookups don't happen outside transactions. · 42c80c69
      Robert Haas 提交于
      Andres Freund
      42c80c69
  4. 15 7月, 2013 1 次提交
    • S
      Ensure 64bit arithmetic when calculating tapeSpace · 273dcd16
      Stephen Frost 提交于
      In tuplesort.c:inittapes(), we calculate tapeSpace by first figuring
      out how many 'tapes' we can use (maxTapes) and then multiplying the
      result by the tape buffer overhead for each.  Unfortunately, when
      we are on a system with an 8-byte long, we allow work_mem to be
      larger than 2GB and that allows maxTapes to be large enough that the
      32bit arithmetic can overflow when multiplied against the buffer
      overhead.
      
      When this overflow happens, we end up adding the overflow to the
      amount of space available, causing the amount of memory allocated to
      be larger than work_mem.
      
      Note that to reach this point, you have to set work mem to at least
      24GB and be sorting a set which is at least that size.  Given that a
      user who can set work_mem to 24GB could also set it even higher, if
      they were looking to run the system out of memory, this isn't
      considered a security issue.
      
      This overflow risk was found by the Coverity scanner.
      
      Back-patch to all supported branches, as this issue has existed
      since before 8.4.
      273dcd16
  5. 13 7月, 2013 3 次提交
    • P
      Add session_preload_libraries configuration parameter · 070518dd
      Peter Eisentraut 提交于
      This is like shared_preload_libraries except that it takes effect at
      backend start and can be changed without a full postmaster restart.  It
      is like local_preload_libraries except that it is still only settable by
      a superuser.  This can be a better way to load modules such as
      auto_explain.
      
      Since there are now three preload parameters, regroup the documentation
      a bit.  Put all parameters into one section, explain common
      functionality only once, update the descriptions to reflect current and
      future realities.
      Reviewed-by: NDimitri Fontaine <dimitri@2ndQuadrant.fr>
      070518dd
    • N
      Switch user ID to the object owner when populating a materialized view. · f3ab5d46
      Noah Misch 提交于
      This makes superuser-issued REFRESH MATERIALIZED VIEW safe regardless of
      the object's provenance.  REINDEX is an earlier example of this pattern.
      As a downside, functions called from materialized views must tolerate
      running in a security-restricted operation.  CREATE MATERIALIZED VIEW
      need not change user ID.  Nonetheless, avoid creation of materialized
      views that will invariably fail REFRESH by making it, too, start a
      security-restricted operation.
      
      Back-patch to 9.3 so materialized views have this from the beginning.
      
      Reviewed by Kevin Grittner.
      f3ab5d46
    • N
  6. 10 7月, 2013 1 次提交
  7. 09 7月, 2013 1 次提交
    • P
      Fix bool abuse · 7888c612
      Peter Eisentraut 提交于
      path_encode's "closed" argument used to take three values: TRUE, FALSE,
      or -1, while being of type bool.  Replace that with a three-valued enum
      for more clarity.
      7888c612
  8. 08 7月, 2013 3 次提交
    • H
      Fix Windows build. · f489470f
      Heikki Linnakangas 提交于
      Was broken by my xloginsert scaling patch. XLogCtl global variable needs
      to be initialized in each process, as it's not inherited by fork() on
      Windows.
      f489470f
    • H
      Improve scalability of WAL insertions. · 9a20a9b2
      Heikki Linnakangas 提交于
      This patch replaces WALInsertLock with a number of WAL insertion slots,
      allowing multiple backends to insert WAL records to the WAL buffers
      concurrently. This is particularly useful for parallel loading large amounts
      of data on a system with many CPUs.
      
      This has one user-visible change: switching to a new WAL segment with
      pg_switch_xlog() now fills the remaining unused portion of the segment with
      zeros. This potentially adds some overhead, but it has been a very common
      practice by DBA's to clear the "tail" of the segment with an external
      pg_clearxlogtail utility anyway, to make the WAL files compress better.
      With this patch, it's no longer necessary to do that.
      
      This patch adds a new GUC, xloginsert_slots, to tune the number of WAL
      insertion slots. Performance testing suggests that the default, 8, works
      pretty well for all kinds of worklods, but I left the GUC in place to allow
      others with different hardware to test that easily. We might want to remove
      that before release.
      
      Reviewed by Andres Freund.
      9a20a9b2
    • T
      Fix planning of parameterized appendrel paths with expensive join quals. · 5372275b
      Tom Lane 提交于
      The code in set_append_rel_pathlist() for building parameterized paths
      for append relations (inheritance and UNION ALL combinations) supposed
      that the cheapest regular path for a child relation would still be cheapest
      when reparameterized.  Which might not be the case, particularly if the
      added join conditions are expensive to compute, as in a recent example from
      Jeff Janes.  Fix it to compare child path costs *after* reparameterizing.
      We can short-circuit that if the cheapest pre-existing path is already
      parameterized correctly, which seems likely to be true often enough to be
      worth checking for.
      
      Back-patch to 9.2 where parameterized paths were introduced.
      5372275b
  9. 07 7月, 2013 1 次提交
    • J
      Handle posix_fallocate() errors. · 5b571bb8
      Jeff Davis 提交于
      On some platforms, posix_fallocate() is available but may still return
      EINVAL if the underlying filesystem does not support it.  So, in case
      of an error, fall through to the alternate implementation that just
      writes zeros.
      
      Per buildfarm failure and analysis by Tom Lane.
      5b571bb8
  10. 06 7月, 2013 2 次提交
  11. 05 7月, 2013 3 次提交
    • M
      Expose the estimation of number of changed tuples since last analyze · c87ff71f
      Magnus Hagander 提交于
      This value, now pg_stat_all_tables.n_mod_since_analyze, was already
      tracked and used by autovacuum, but not exposed to the user.
      
      Mark Kirkwood, review by Laurenz Albe
      c87ff71f
    • N
      Use type "int64" for memory accounting in tuplesort.c/tuplestore.c. · 79e0f87a
      Noah Misch 提交于
      Commit 263865a4 switched tuplesort.c and
      tuplestore.c variables representing memory usage from type "long" to
      type "Size".  This was unnecessary; I thought doing so avoided overflow
      scenarios on 64-bit Windows, but guc.c already limited work_mem so as to
      prevent the overflow.  It was also incomplete, not touching the logic
      that assumed a signed data type.  Change the affected variables to
      "int64".  This is perfect for 64-bit platforms, and it reduces the need
      to contemplate platform-specific overflow scenarios.  It also puts us
      close to being able to support work_mem over 2 GiB on 64-bit Windows.
      
      Per report from Andres Freund.
      79e0f87a
    • F
      Fix typo in comment. · 7842d41d
      Fujii Masao 提交于
      Michael Paquier
      7842d41d
  12. 04 7月, 2013 3 次提交
    • R
      Add new GUC, max_worker_processes, limiting number of bgworkers. · 6bc8ef0b
      Robert Haas 提交于
      In 9.3, there's no particular limit on the number of bgworkers;
      instead, we just count up the number that are actually registered,
      and use that to set MaxBackends.  However, that approach causes
      problems for Hot Standby, which needs both MaxBackends and the
      size of the lock table to be the same on the standby as on the
      master, yet it may not be desirable to run the same bgworkers in
      both places.  9.3 handles that by failing to notice the problem,
      which will probably work fine in nearly all cases anyway, but is
      not theoretically sound.
      
      A further problem with simply counting the number of registered
      workers is that new workers can't be registered without a
      postmaster restart.  This is inconvenient for administrators,
      since bouncing the postmaster causes an interruption of service.
      Moreover, there are a number of applications for background
      processes where, by necessity, the background process must be
      started on the fly (e.g. parallel query).  While this patch
      doesn't actually make it possible to register new background
      workers after startup time, it's a necessary prerequisite.
      
      Patch by me.  Review by Michael Paquier.
      6bc8ef0b
    • F
      Get rid of pg_class.reltoastidxid. · 2ef085d0
      Fujii Masao 提交于
      Treat TOAST index just the same as normal one and get the OID
      of TOAST index from pg_index but not pg_class.reltoastidxid.
      This change allows us to handle multiple TOAST indexes, and
      which is required infrastructure for upcoming
      REINDEX CONCURRENTLY feature.
      
      Patch by Michael Paquier, reviewed by Andres Freund and me.
      2ef085d0
    • T
      Fix handling of auto-updatable views on inherited tables. · 5530a826
      Tom Lane 提交于
      An INSERT into such a view should work just like an INSERT into its base
      table, ie the insertion should go directly into that table ... not be
      duplicated into each child table, as was happening before, per bug #8275
      from Rushabh Lathia.  On the other hand, the current behavior for
      UPDATE/DELETE seems reasonable: the update/delete traverses the child
      tables, or not, depending on whether the view specifies ONLY or not.
      Add some regression tests covering this area.
      
      Dean Rasheed
      5530a826
  13. 03 7月, 2013 2 次提交
    • A
      Unbreak postmaster restart-after-crash sequence · 620935ad
      Alvaro Herrera 提交于
      In patch 82233ce7, AbortStartTime wasn't being reset appropriately
      after the restart sequence, causing subsequent iterations through
      ServerLoop to malfunction.
      620935ad
    • R
      Add support for multiple kinds of external toast datums. · 36820250
      Robert Haas 提交于
      To that end, support tags rather than lengths for external datums.
      As an example of how this can be used, add support or "indirect"
      tuples which point to some externally allocated memory containing
      a toast tuple.  Similar infrastructure could be used for other
      purposes, including, perhaps, support for alternative compression
      algorithms.
      
      Andres Freund, reviewed by Hitoshi Harada and myself
      36820250
  14. 02 7月, 2013 3 次提交
    • R
      Use an MVCC snapshot, rather than SnapshotNow, for catalog scans. · 568d4138
      Robert Haas 提交于
      SnapshotNow scans have the undesirable property that, in the face of
      concurrent updates, the scan can fail to see either the old or the new
      versions of the row.  In many cases, we work around this by requiring
      DDL operations to hold AccessExclusiveLock on the object being
      modified; in some cases, the existing locking is inadequate and random
      failures occur as a result.  This commit doesn't change anything
      related to locking, but will hopefully pave the way to allowing lock
      strength reductions in the future.
      
      The major issue has held us back from making this change in the past
      is that taking an MVCC snapshot is significantly more expensive than
      using a static special snapshot such as SnapshotNow.  However, testing
      of various worst-case scenarios reveals that this problem is not
      severe except under fairly extreme workloads.  To mitigate those
      problems, we avoid retaking the MVCC snapshot for each new scan;
      instead, we take a new snapshot only when invalidation messages have
      been processed.  The catcache machinery already requires that
      invalidation messages be sent before releasing the related heavyweight
      lock; else other backends might rely on locally-cached data rather
      than scanning the catalog at all.  Thus, making snapshot reuse
      dependent on the same guarantees shouldn't break anything that wasn't
      already subtly broken.
      
      Patch by me.  Review by Michael Paquier and Andres Freund.
      568d4138
    • R
      Add a convenience routine makeFuncCall to reduce duplication. · 0d22987a
      Robert Haas 提交于
      David Fetter and Andrew Gierth, reviewed by Jeevan Chalke
      0d22987a
    • B
      Add timezone offset output option to to_char() · 7408c5d2
      Bruce Momjian 提交于
      Add ability for to_char() to output the timezone's UTC offset (OF).  We
      already have the ability to return the timezone abbeviation (TZ/tz).
      Per request from Andrew Dunstan
      7408c5d2
  15. 01 7月, 2013 2 次提交
    • H
      Optimize pglz compressor for small inputs. · 031cc55b
      Heikki Linnakangas 提交于
      The pglz compressor has a significant startup cost, because it has to
      initialize to zeros the history-tracking hash table. On a 64-bit system, the
      hash table was 64kB in size. While clearing memory is pretty fast, for very
      short inputs the relative cost of that was quite large.
      
      This patch alleviates that in two ways. First, instead of storing pointers
      in the hash table, store 16-bit indexes into the hist_entries array. That
      slashes the size of the hash table to 1/2 or 1/4 of the original, depending
      on the pointer width. Secondly, adjust the size of the hash table based on
      input size. For very small inputs, you don't need a large hash table to
      avoid collisions.
      
      Review by Amit Kapila.
      031cc55b
    • H
      Retry short writes when flushing WAL. · 79ce29c7
      Heikki Linnakangas 提交于
      We don't normally bother retrying when the number of bytes written by
      write() is short of what was requested. It is generally assumed that a
      write() to disk doesn't return short, unless you run out of disk space.
      While writing the WAL, however, it seems prudent to try a bit harder,
      because a failure leads to PANIC. The write() is also much larger than most
      write()s in the backend (up to wal_buffers), so there's more room for
      surprises.
      
      Also retry on EINTR. All signals used in the backend are flagged SA_RESTART
      nowadays, so it shouldn't happen, but better to be defensive.
      79ce29c7
  16. 29 6月, 2013 3 次提交