1. 16 3月, 2016 2 次提交
    • R
      Add simple VACUUM progress reporting. · c16dc1ac
      Robert Haas 提交于
      There's a lot more that could be done here yet - in particular, this
      reports only very coarse-grained information about the index vacuuming
      phase - but even as it stands, the new pg_stat_progress_vacuum can
      tell you quite a bit about what a long-running vacuum is actually
      doing.
      
      Amit Langote and Robert Haas, based on earlier work by Vinayak Pokale
      and Rahila Syed.
      c16dc1ac
    • T
      Cope if platform declares mbstowcs_l(), but not locale_t, in <xlocale.h>. · 0e9b8998
      Tom Lane 提交于
      Previously, we included <xlocale.h> only if necessary to get the definition
      of type locale_t.  According to notes in PGAC_TYPE_LOCALE_T, this is
      important because on some versions of glibc that file supplies an
      incompatible declaration of locale_t.  (This info may be obsolete, because
      on my RHEL6 box that seems to be the *only* definition of locale_t; but
      there may still be glibc's in the wild for which it's a live concern.)
      
      It turns out though that on FreeBSD and maybe other BSDen, you can get
      locale_t from stdlib.h or locale.h but mbstowcs_l() and friends only from
      <xlocale.h>.  This was leaving us compiling calls to mbstowcs_l() and
      friends with no visible prototype, which causes a warning and could
      possibly cause actual trouble, since it's not declared to return int.
      
      Hence, adjust the configure checks so that we'll include <xlocale.h>
      either if it's necessary to get type locale_t or if it's necessary to
      get a declaration of mbstowcs_l().
      
      Report and patch by Aleksander Alekseev, somewhat whacked around by me.
      Back-patch to all supported branches, since we have been using
      mbstowcs_l() since 9.1.
      0e9b8998
  2. 15 3月, 2016 12 次提交
    • T
      Add a GetForeignUpperPaths callback function for FDWs. · 101fd934
      Tom Lane 提交于
      This is basically like the just-added create_upper_paths_hook, but
      control is funneled only to the FDW responsible for all the baserels
      of the current query; so providing such a callback is much less likely
      to add useless overhead than using the hook function is.
      
      The documentation is a bit sketchy.  We'll likely want to improve it,
      and/or adjust the call conventions, when we get some experience with
      actually using this callback.  Hopefully somebody will find time to
      experiment with it before 9.6 feature freeze.
      101fd934
    • P
      Add missing include for self-containment · be6de4c1
      Peter Eisentraut 提交于
      be6de4c1
    • R
      Fix EXPLAIN ANALYZE SELECT INTO not to choose a parallel plan. · 270b7daf
      Robert Haas 提交于
      We don't support any parallel write operations at present, so choosing
      a parallel plan causes us to error out.  Also, add a new regression
      test that uses EXPLAIN ANALYZE SELECT INTO; if we'd had this previously,
      force_parallel_mode testing would have caught this issue.
      
      Mithun Cy and Robert Haas
      270b7daf
    • T
      Provide a planner hook at a suitable place for creating upper-rel Paths. · 5864d6a4
      Tom Lane 提交于
      In the initial revision of the upper-planner pathification work, the only
      available way for an FDW or custom-scan provider to inject Paths
      representing post-scan-join processing was to insert them during scan-level
      GetForeignPaths or similar processing.  While that's not impossible, it'd
      require quite a lot of duplicative processing to look forward and see if
      the extension would be capable of implementing the whole query.  To improve
      matters for custom-scan providers, provide a hook function at the point
      where the core code is about to start filling in upperrel Paths.  At this
      point Paths are available for the whole scan/join tree, which should reduce
      the amount of redundant effort considerably.
      
      (An alternative design that was suggested was to provide a separate hook
      for each post-scan-join processing step, but that seems messy and not
      clearly more useful.)
      
      Following our time-honored tradition, there's no documentation for this
      hook outside the source code.
      
      As-is, this hook is only meant for custom scan providers, which we can't
      assume very much about.  A followon patch will implement an FDW callback
      to let FDWs do the same thing in a somewhat more structured fashion.
      5864d6a4
    • T
      Allow callers of create_foreignscan_path to specify nondefault PathTarget. · 28048cba
      Tom Lane 提交于
      Although the default choice of rel->reltarget should typically be
      sufficient for scan or join paths, it's not at all sufficient for the
      purposes PathTargets were invented for; in particular not for
      upper-relation Paths.  So break API compatibility by adding a PathTarget
      argument to create_foreignscan_path().  To ease updating of existing
      code, accept a NULL value of the argument as selecting rel->reltarget.
      28048cba
    • T
      Rethink representation of PathTargets. · 307c7885
      Tom Lane 提交于
      In commit 19a54114 I did not make PathTarget a subtype of Node,
      and embedded a RelOptInfo's reltarget directly into it rather than having
      a separately-allocated Node.  In hindsight that was misguided
      micro-optimization, enabled by the fact that at that point we didn't have
      any Paths with custom PathTargets.  Now that PathTarget processing has
      been fleshed out some more, it's easier to see that it's better to have
      PathTarget as an indepedent Node type, even if it does cost us one more
      palloc to create a RelOptInfo.  So change it while we still can.
      
      This commit just changes the representation, without doing anything more
      interesting than that.
      307c7885
    • T
      Update PL/Perl's comment about hv_store(). · 07341a29
      Tom Lane 提交于
      Negative klen is documented since Perl 5.16, and 5.6 is no longer
      supported so no need to comment about it.
      
      Dagfinn Ilmari Mannsåker
      07341a29
    • T
      Improve conversions from uint64 to Perl types. · f3f3aae4
      Tom Lane 提交于
      Perl's integers are pointer-sized, so can hold more than INT_MAX on LP64
      platforms, and come in both signed (IV) and unsigned (UV).  Floating
      point values (NV) may also be larger than double.
      
      Since Perl 5.19.4 array indices are SSize_t instead of I32, so allow up
      to SSize_t_max on those versions.  The limit is not imposed just by
      av_extend's argument type, but all the array handling code, so remove
      the speculative comment.
      
      Dagfinn Ilmari Mannsåker
      f3f3aae4
    • R
      Update more comments for 96198d94. · 6be84eeb
      Robert Haas 提交于
      Etsuro Fujita, reviewed (though not completely endorsed) by Ashutosh
      Bapat, and slightly expanded by me.
      6be84eeb
    • T
      Use repalloc_huge() to enlarge a SPITupleTable's tuple pointer array. · 74a379b9
      Tom Lane 提交于
      Commit 23a27b03 widened the rows-stored counters to uint64, but
      that's academic unless we allow the tuple pointer array to exceed 1GB.
      
      (It might be a good idea to provide some other limit on how much storage
      a SPITupleTable can eat.  On the other hand, there are plenty of other
      ways to drive a backend into swap hell.)
      
      Dagfinn Ilmari Mannsåker
      74a379b9
    • R
      Improve check for overly-long extensible node name. · 3adf9ced
      Robert Haas 提交于
      The old code is bad for two reasons.  First, it has an off-by-one
      error.  Second, it won't help if you aren't running with assertions
      enabled.  Per discussion, we want a check here in that case too.
      
      Author: KaiGai Kohei, adjusted by me.
      Reviewed-by: Petr Jelinek
      Discussion: 56E0D547.1030101@2ndquadrant.com
      3adf9ced
    • T
      pg_stat_get_progress_info() should be marked STRICT. · 2da75499
      Tom Lane 提交于
      I didn't bother with a catversion bump.
      
      Report and patch by Thomas Munro
      2da75499
  3. 14 3月, 2016 2 次提交
    • T
      Fix memory leak in repeated GIN index searches. · ab4ff288
      Tom Lane 提交于
      Commit d88976cf removed this code from ginFreeScanKeys():
      -		if (entry->list)
      -			pfree(entry->list);
      evidently in the belief that that ItemPointer array is allocated in the
      keyCtx and so would be reclaimed by the following MemoryContextReset.
      Unfortunately, it isn't and it won't.  It'd likely be a good idea for
      that to become so, but as a simple and back-patchable fix in the
      meantime, restore this code to ginFreeScanKeys().
      
      Also, add a similar pfree to where startScanEntry() is about to zero out
      entry->list.  I am not sure if there are any code paths where this
      change prevents a leak today, but it seems like cheap future-proofing.
      
      In passing, make the initial allocation of so->entries[] use palloc
      not palloc0.  The code doesn't depend on unused entries being zero;
      if it did, the array-enlargement code in ginFillScanEntry() would be
      wrong.  So using palloc0 initially can only serve to confuse readers
      about what the invariant is.
      
      Per report from Felipe de Jesús Molina Bravo, via Jaime Casanova in
      <CAJGNTeMR1ndMU2Thpr8GPDUfiHTV7idELJRFusA5UXUGY1y-eA@mail.gmail.com>
      ab4ff288
    • P
      96adb14d
  4. 13 3月, 2016 7 次提交
    • M
      Fix order of MemSet arguments · a1aa8b7e
      Magnus Hagander 提交于
      Noted by Tomas Vondra
      a1aa8b7e
    • T
      Report memory context stats upon out-of-memory in repalloc[_huge]. · 4b980167
      Tom Lane 提交于
      This longstanding functionality evidently got lost in commit
      3d6d1b58.  Noted while studying an OOM report from Jaime
      Casanova.  Backpatch to 9.5 where the bug was introduced.
      4b980167
    • T
      Fix Windows portability issue in 23a27b03. · ab737f6b
      Tom Lane 提交于
      _strtoui64() is available in MSVC builds, but apparently not with
      other Windows toolchains.  Thanks to Petr Jelinek for the diagnosis.
      ab737f6b
    • T
      Get rid of scribbling on a const variable in psql's print.c. · fc7a9dfd
      Tom Lane 提交于
      Commit a2dabf0e had the bright idea that it could modify a "const"
      global variable if it merely casted away const from a pointer.  This does
      not work on platforms where the compiler puts "const" variables into
      read-only storage.  Depressingly, we evidently have no such platforms in
      our buildfarm ... an oversight I have now remedied.  (The one platform
      that is known to catch this is recent OS X with -fno-common.)
      
      Per report from Chris Ruprecht.  Back-patch to 9.5 where the bogus
      code was introduced.
      fc7a9dfd
    • T
      Widen query numbers-of-tuples-processed counters to uint64. · 23a27b03
      Tom Lane 提交于
      This patch widens SPI_processed, EState's es_processed field, PortalData's
      portalPos field, FuncCallContext's call_cntr and max_calls fields,
      ExecutorRun's count argument, PortalRunFetch's result, and the max number
      of rows in a SPITupleTable to uint64, and deals with (I hope) all the
      ensuing fallout.  Some of these values were declared uint32 before, and
      others "long".
      
      I also removed PortalData's posOverflow field, since that logic seems
      pretty useless given that portalPos is now always 64 bits.
      
      The user-visible results are that command tags for SELECT etc will
      correctly report tuple counts larger than 4G, as will plpgsql's GET
      GET DIAGNOSTICS ... ROW_COUNT command.  Queries processing more tuples
      than that are still not exactly the norm, but they're becoming more
      common.
      
      Most values associated with FETCH/MOVE distances, such as PortalRun's count
      argument and the count argument of most SPI functions that have one, remain
      declared as "long".  It's not clear whether it would be worth promoting
      those to int64; but it would definitely be a large dollop of additional
      API churn on top of this, and it would only help 32-bit platforms which
      seem relatively less likely to see any benefit.
      
      Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
      23a27b03
    • A
      Include portability/mem.h into fd.c for MAP_FAILED. · e0115750
      Andres Freund 提交于
      Buildfarm members gaur and pademelon are old enough not to know about
      MAP_FAILED; which is used in 428b1d6b. Include portability/mem.h to fix;
      as already done in a bunch of other places.
      e0115750
    • T
      Re-export a few of createplan.c's make_xxx() functions. · 570be1f7
      Tom Lane 提交于
      CitusDB is using these and don't wish to redesign their code right now.
      I am not on board with this being a good idea, or a good precedent,
      but I lack the energy to fight about it.
      570be1f7
  5. 12 3月, 2016 6 次提交
    • R
      pg_upgrade: Convert old visibility map format to new format. · 7087166a
      Robert Haas 提交于
      Commit a892234f added a second bit per
      page to the visibility map, but pg_upgrade has been unaware of it up
      until now.  Therefore, a pg_upgrade from an earlier major release of
      PostgreSQL to any commit preceding this one and following the one
      mentioned above would result in invalid visibility map contents on the
      new cluster, very possibly leading to data corruption.  This plugs
      that hole.
      
      Masahiko Sawada, reviewed by Jeff Janes, Bruce Momjian, Simon Riggs,
      Michael Paquier, Andres Freund, me, and others.
      7087166a
    • T
      When appropriate, postpone SELECT output expressions till after ORDER BY. · 9118d03a
      Tom Lane 提交于
      It is frequently useful for volatile, set-returning, or expensive functions
      in a SELECT's targetlist to be postponed till after ORDER BY and LIMIT are
      done.  Otherwise, the functions might be executed for every row of the
      table despite the presence of LIMIT, and/or be executed in an unexpected
      order.  For example, in
      	SELECT x, nextval('seq') FROM tab ORDER BY x LIMIT 10;
      it's probably desirable that the nextval() values are ordered the same
      as x, and that nextval() is not run more than 10 times.
      
      In the past, Postgres was inconsistent in this area: you would get the
      desirable behavior if the ordering were performed via an indexscan, but
      not if it had to be done by an explicit sort step.  Getting the desired
      behavior reliably required contortions like
      	SELECT x, nextval('seq')
      	  FROM (SELECT x FROM tab ORDER BY x) ss LIMIT 10;
      
      This patch conditionally postpones evaluation of pure-output target
      expressions (that is, those that are not used as DISTINCT, ORDER BY, or
      GROUP BY columns) so that they effectively occur after sorting, even if an
      explicit sort step is necessary.  Volatile expressions and set-returning
      expressions are always postponed, so as to provide consistent semantics.
      Expensive expressions (costing more than 10 times typical operator cost,
      which by default would include any user-defined function) are postponed
      if there is a LIMIT or if there are expressions that must be postponed.
      
      We could be more aggressive and postpone any nontrivial expression, but
      there are costs associated with doing so: it requires an extra Result plan
      node which adds some overhead, and postponement changes the volume of data
      going through the sort step, perhaps for the worse.  Since we tend not to
      have very good estimates of the output width of nontrivial expressions,
      it's hard to have much confidence in our ability to predict whether
      postponement would increase or decrease the cost of the sort; therefore
      this patch doesn't attempt to make decisions conditionally on that.
      Between these factors and a general desire not to change query behavior
      when there's not a demonstrable benefit, it seems best to be conservative
      about applying postponement.  We might tweak the decision rules in the
      future, though.
      
      Konstantin Knizhnik, heavily rewritten by me
      9118d03a
    • T
      Fix Windows build broken in 6943a946 · b1fdc727
      Teodor Sigaev 提交于
      Also it fixes dynamic array allocation disallowed by ANSI-C.
      
      Author: Stas Kelvich
      b1fdc727
    • T
      Fix merge affixes for numeric ones · 8829af47
      Teodor Sigaev 提交于
      Some dictionaries have duplicated base words with different affix set, we
      just merge that sets into one set. But previously merging of sets of affixes
      was actually a concatenation of strings but it's wrong for numeric
      representation of affixes because such representation uses comma to
      separate affixes.
      
      Author: Artur Zakirov
      8829af47
    • T
      Bump catalog version missed in 6943a946 · a9eb6c83
      Teodor Sigaev 提交于
      a9eb6c83
    • T
      Tsvector editing functions · 6943a946
      Teodor Sigaev 提交于
      Adds several tsvector editting function: convert tsvector to/from text array,
      set weight for given lexemes, delete lexeme(s), unnest, filter lexemes
      with given weights
      
      Author: Stas Kelvich with some editorization by me
      Reviewers: Tomas Vondram, Teodor Sigaev
      6943a946
  6. 11 3月, 2016 11 次提交
    • T
      Minor additional refactoring of planner.c's PathTarget handling. · 49635d7b
      Tom Lane 提交于
      Teach make_group_input_target() and make_window_input_target() to work
      entirely with the PathTarget representation of tlists, rather than
      constructing a tlist and immediately deconstructing it into PathTarget
      format.  In itself this only saves a few palloc's; the bigger picture is
      that it opens the door for sharing cost_qual_eval work across all of
      planner.c's constructions of PathTargets.  I'll come back to that later.
      
      In support of this, flesh out tlist.c's infrastructure for PathTargets
      a bit more.
      49635d7b
    • R
      psql: Don't automatically use expanded format when there's 1 column. · 69ab7b9d
      Robert Haas 提交于
      Andreas Karlsson and Robert Haas
      69ab7b9d
    • R
      Fix a typo, and remove unnecessary pgstat_report_wait_end(). · 481c76ab
      Robert Haas 提交于
      Per Amit Kapila.
      481c76ab
    • M
      Refactor receivelog.c parameters · 38c83c9b
      Magnus Hagander 提交于
      Much cruft had accumulated over time with a large number of parameters
      passed down between functions very deep. With this refactoring, instead
      introduce a StreamCtl structure that holds the parameters, and pass around
      a pointer to this structure instead. This makes it much easier to add or
      remove fields that are needed deeper down in the implementation without
      having to modify every function header in the file.
      
      Patch by me after much nagging from Andres
      Reviewed by Craig Ringer and Daniel Gustafsson
      38c83c9b
    • S
      Allow emit_log_hook to see original message text · 73e7e49d
      Simon Riggs 提交于
      emit_log_hook could only see the translated text, making it harder to identify
      which message was being sent. Pass original text to allow the exact message to
      be identified, whichever language is used for logging.
      
      Discussion: 20160216.184755.59721141.horiguchi.kyotaro@lab.ntt.co.jp
      Author: Kyotaro Horiguchi
      73e7e49d
    • R
      Simplify GetLockNameFromTagType. · a414d96a
      Robert Haas 提交于
      The old code is wrong, because it returns a pointer to an automatic
      variable.  And it's also more clever than we really need to be
      considering that the case it's worrying about should never happen.
      a414d96a
    • A
      Blindly try to fix dtrace enabled builds, broken in 9cd00c45. · c94f0c29
      Andres Freund 提交于
      Reported-By: Peter Eisentraut
      Discussion: 56E2239E.1050607@gmx.net
      c94f0c29
    • A
      Checkpoint sorting and balancing. · 9cd00c45
      Andres Freund 提交于
      Up to now checkpoints were written in the order they're in the
      BufferDescriptors. That's nearly random in a lot of cases, which
      performs badly on rotating media, but even on SSDs it causes slowdowns.
      
      To avoid that, sort checkpoints before writing them out. We currently
      sort by tablespace, relfilenode, fork and block number.
      
      One of the major reasons that previously wasn't done, was fear of
      imbalance between tablespaces. To address that balance writes between
      tablespaces.
      
      The other prime concern was that the relatively large allocation to sort
      the buffers in might fail, preventing checkpoints from happening. Thus
      pre-allocate the required memory in shared memory, at server startup.
      
      This particularly makes it more efficient to have checkpoint flushing
      enabled, because that'll often result in a lot of writes that can be
      coalesced into one flush.
      
      Discussion: alpine.DEB.2.10.1506011320000.28433@sto
      Author: Fabien Coelho and Andres Freund
      9cd00c45
    • A
      Allow to trigger kernel writeback after a configurable number of writes. · 428b1d6b
      Andres Freund 提交于
      Currently writes to the main data files of postgres all go through the
      OS page cache. This means that some operating systems can end up
      collecting a large number of dirty buffers in their respective page
      caches.  When these dirty buffers are flushed to storage rapidly, be it
      because of fsync(), timeouts, or dirty ratios, latency for other reads
      and writes can increase massively.  This is the primary reason for
      regular massive stalls observed in real world scenarios and artificial
      benchmarks; on rotating disks stalls on the order of hundreds of seconds
      have been observed.
      
      On linux it is possible to control this by reducing the global dirty
      limits significantly, reducing the above problem. But global
      configuration is rather problematic because it'll affect other
      applications; also PostgreSQL itself doesn't always generally want this
      behavior, e.g. for temporary files it's undesirable.
      
      Several operating systems allow some control over the kernel page
      cache. Linux has sync_file_range(2), several posix systems have msync(2)
      and posix_fadvise(2). sync_file_range(2) is preferable because it
      requires no special setup, whereas msync() requires the to-be-flushed
      range to be mmap'ed. For the purpose of flushing dirty data
      posix_fadvise(2) is the worst alternative, as flushing dirty data is
      just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages
      from the page cache.  Thus the feature is enabled by default only on
      linux, but can be enabled on all systems that have any of the above
      APIs.
      
      While desirable and likely possible this patch does not contain an
      implementation for windows.
      
      With the infrastructure added, writes made via checkpointer, bgwriter
      and normal user backends can be flushed after a configurable number of
      writes. Each of these sources of writes controlled by a separate GUC,
      checkpointer_flush_after, bgwriter_flush_after and backend_flush_after
      respectively; they're separate because the number of flushes that are
      good are separate, and because the performance considerations of
      controlled flushing for each of these are different.
      
      A later patch will add checkpoint sorting - after that flushes from the
      ckeckpoint will almost always be desirable. Bgwriter flushes are most of
      the time going to be random, which are slow on lots of storage hardware.
      Flushing in backends works well if the storage and bgwriter can keep up,
      but if not it can have negative consequences.  This patch is likely to
      have negative performance consequences without checkpoint sorting, but
      unfortunately so has sorting without flush control.
      
      Discussion: alpine.DEB.2.10.1506011320000.28433@sto
      Author: Fabien Coelho and Andres Freund
      428b1d6b
    • T
      Give pull_var_clause() reject/recurse/return behavior for WindowFuncs too. · c82c92b1
      Tom Lane 提交于
      All along, this function should have treated WindowFuncs in a manner
      similar to Aggrefs, ie with an option whether or not to recurse into them.
      By not considering the case, it was always recursing, which is OK for most
      callers (although I suspect that the case in prepare_sort_from_pathkeys
      might represent a bug).  But now we need return-without-recursing behavior
      as well.  There are also more than a few callers that should never see a
      WindowFunc, and now we'll get some error checking on that.
      c82c92b1
    • R
      Don't vacuum all-frozen pages. · fd31cd26
      Robert Haas 提交于
      Commit a892234f gave us enough
      infrastructure to avoid vacuuming pages where every tuple on the
      page is already frozen.  So, replace the notion of a scan_all or
      whole-table vacuum with the less onerous notion of an "aggressive"
      vacuum, which will pages that are all-visible, but still skip those
      that are all-frozen.
      
      This should greatly reduce the cost of anti-wraparound vacuuming
      on large clusters where the majority of data is never touched
      between one cycle and the next, because we'll no longer have to
      read all of those pages only to find out that we don't need to
      do anything with them.
      
      Patch by me, reviewed by Masahiko Sawada.
      fd31cd26