提交 · ec73b56a31fd0933280e85cd4e7b17c45c2ccbed · Greenplum / Gpdb

04 8月, 2008 1 次提交

Make GROUP BY work properly for datatypes that only support hashing and not · ec73b56a

由 Tom Lane 提交于 8月 03, 2008

sorting.  The infrastructure for this was all in place already; it's only
necessary to fix the planner to not assume that sorting is always an available
option.

ec73b56a

03 8月, 2008 1 次提交

Rearrange the querytree representation of ORDER BY/GROUP BY/DISTINCT items · 95113047

由 Tom Lane 提交于 8月 02, 2008

as per my recent proposal:

1. Fold SortClause and GroupClause into a single node type SortGroupClause.
We were already relying on them to be struct-equivalent, so using two node
tags wasn't accomplishing much except to get in the way of comparing items
with equal().

2. Add an "eqop" field to SortGroupClause to carry the associated equality
operator. This is cheap for the parser to get at the same time it's looking
up the sort operator, and storing it eliminates the need for repeated
not-so-cheap lookups during planning. In future this will also let us
represent GROUP/DISTINCT operations on datatypes that have hash opclasses
but no btree opclasses (ie, they have equality but no natural sort order).
The previous representation simply didn't work for that, since its only
indicator of comparison semantics was a sort operator.

3. Add a hasDistinctOn boolean to struct Query to explicitly record whether
the distinctClause came from DISTINCT or DISTINCT ON. This allows removing
some complicated and not 100% bulletproof code that attempted to figure
that out from the distinctClause alone.

This patch doesn't in itself create any new capability, but it's necessary
infrastructure for future attempts to use hash-based grouping for DISTINCT
and UNION/INTERSECT/EXCEPT.

95113047

01 8月, 2008 1 次提交

Fix parser so that we don't modify the user-written ORDER BY list in order · 63247bec

由 Tom Lane 提交于 7月 31, 2008

to represent DISTINCT or DISTINCT ON. This gets rid of a longstanding
annoyance that a view or rule using SELECT DISTINCT will be dumped out
with an overspecified ORDER BY list, and is one small step along the way
to decoupling DISTINCT and ORDER BY enough so that hash-based implementation
of DISTINCT will be possible. In passing, improve transformDistinctClause
so that it doesn't reject duplicate DISTINCT ON items, as was reported by
Steve Midgley a couple weeks ago.

63247bec

10 7月, 2008 2 次提交

Tighten up SS_finalize_plan's computation of valid_params to exclude Params of · eaf1b5d3

由 Tom Lane 提交于 7月 10, 2008

the current query level that aren't in fact output parameters of the current
initPlans. (This means, for example, output parameters of regular subplans.)
To make this work correctly for output parameters coming from sibling
initplans requires rejiggering the API of SS_finalize_plan just a bit:
we need the siblings to be visible to it, rather than hidden as
SS_make_initplan_from_plan had been doing. This is really part of my response
to bug #4290, but I concluded this part probably shouldn't be back-patched,
since all that it's doing is to make a debugging cross-check tighter.

eaf1b5d3

Fix mis-calculation of extParam/allParam sets for plan nodes, as seen in · 772a6d45

由 Tom Lane 提交于 7月 10, 2008

bug #4290. The fundamental bug is that masking extParam by outer_params,
as finalize_plan had been doing, caused us to lose the information that
an initPlan depended on the output of a sibling initPlan. On reflection
the best thing to do seemed to be not to try to adjust outer_params for
this case but get rid of it entirely. The only thing it was really doing
for us was to filter out param IDs associated with SubPlan nodes, and that
can be done (with greater accuracy) while processing individual SubPlan
nodes in finalize_primnode. This approach was vindicated by the discovery
that the masking method was hiding a second bug: SS_finalize_plan failed to
remove extParam bits for initPlan output params that were referenced in the
main plan tree (it only got rid of those referenced by other initPlans).
It's not clear that this caused any real problems, given the limited use
of extParam by the executor, but it's certainly not what was intended.

I originally thought that there was also a problem with needing to include
indirect dependencies on external params in initPlans' param sets, but it
turns out that the executor handles this correctly so long as the depended-on
initPlan is earlier in the initPlans list than the one using its output.
That seems a bit of a fragile assumption, but it is true at the moment,
so I just documented it in some code comments rather than making what would
be rather invasive changes to remove the assumption.

Back-patch to 8.1. Previous versions don't have the case of initPlans
referring to other initPlans' outputs, so while the existing logic is still
questionable for them, there are not any known bugs to be fixed. So I'll
refrain from changing them for now.

772a6d45

28 6月, 2008 1 次提交

Consider a clause to be outerjoin_delayed if it references the nullable side · dcc23347

由 Tom Lane 提交于 6月 27, 2008

of any lower outer join, even if it also references the non-nullable side and
so could not get pushed below the outer join anyway. We need this in case
the clause is an OR clause: if it doesn't get marked outerjoin_delayed,
create_or_index_quals() could pull an indexable restriction for the nullable
side out of it, leading to wrong results as demonstrated by today's bug
report from toruvinn. (See added regression test case for an example.)

In principle this has been wrong for quite a while. In practice I don't
think any branch before 8.3 can really show the failure, because
create_or_index_quals() will only pull out indexable conditions, and before
8.3 those were always strict. So though we might have improperly generated
null-extended rows in the outer join, they'd get discarded from the result
anyway. The gating factor that makes the failure visible is that 8.3
considers "col IS NULL" to be indexable. Hence I'm not going to risk
back-patching further than 8.3.

dcc23347

27 6月, 2008 1 次提交

Improve planner's estimation of the size of an append relation: rather than · 2c2161a4

由 Tom Lane 提交于 6月 27, 2008

taking the maximum of any child rel's width, we should weight the widths
proportionally to the number of rows expected from each child. In hindsight
this is obviously correct because row width is really a proxy for the total
physical size of the relation. Per discussion with Scott Carey (bug #4264).

2c2161a4

19 6月, 2008 1 次提交

Improve our #include situation by moving pointer types away from the · a3540b0f

由 Alvaro Herrera 提交于 6月 19, 2008

corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.

a3540b0f

17 6月, 2008 1 次提交

Fix the code that adds regclass constants to a plan's list of relation OIDs · 2e835a49

由 Tom Lane 提交于 6月 17, 2008

that it depends on for replan-forcing purposes. We need to consider plain OID
constants too, because eval_const_expressions folds a RelabelType atop a Const
to just a Const. This change could result in OID values that aren't really
for tables getting added to the dependency list, but the worst-case
consequence would be occasional useless replans. Per report from Gabriele
Messineo.

2e835a49

03 5月, 2008 1 次提交
- T
  Allow the planner's estimate of the fraction of a cursor's rows that will be · db147b34
  由 Tom Lane 提交于 5月 02, 2008
```
retrieved to be controlled through a GUC variable.

Robert Hell
```
  db147b34
22 4月, 2008 1 次提交

Fix convert_IN_to_join to properly handle the case where the subselect's · ff673f55

由 Tom Lane 提交于 4月 21, 2008

output is not of the same type that's needed for the IN comparison (ie,
where the parser inserted an implicit coercion above the subselect result).
We should record the coerced expression, not just a raw Var referencing
the subselect output, as the quantity that needs to be unique-ified if
we choose to implement the IN as Unique followed by a plain join.

As of 8.3 this error was causing crashes, as seen in bug #4113 from Javier
Hernandez, because the executor was being told to hash or sort the raw
subselect output column using operators appropriate to the coerced type.

In prior versions there was no crash because the executor chose the
hash or sort operators for itself based on the column type it saw.
However, that's still not really right, because what's unique for one data
type might not be unique for another. In corner cases we could get multiple
outputs of a row that should appear only once, as demonstrated by the
regression test case included in this commit.

However, this patch doesn't apply cleanly to 8.2 or before, and the code
involved has shifted enough over time that I'm hesitant to try to back-patch.
Given the lack of complaints from the field about such corner cases, I think
the bug may not be important enough to risk breaking other things with a
back-patch.

ff673f55

21 4月, 2008 1 次提交

Allow float8, int8, and related datatypes to be passed by value on machines · 8472bf7a

由 Tom Lane 提交于 4月 21, 2008

where Datum is 8 bytes wide.  Since this will break old-style C functions
(those still using version 0 calling convention) that have arguments or
results of these types, provide a configure option to disable it and retain
the old pass-by-reference behavior.  Likewise, provide a configure option
to disable the recently-committed float4 pass-by-value change.

Zoltan Boszormenyi, plus configurability stuff by me.

8472bf7a

18 4月, 2008 1 次提交

Fix a couple of oversights associated with the "physical tlist" optimization: · 25e46a50

由 Tom Lane 提交于 4月 17, 2008

we had several code paths where a physical tlist could be used for the input
to a Sort node, which is a dumb idea because any unneeded table columns will
increase the volume of data the sort has to push around.

(Unfortunately the easy-looking fix of calling disuse_physical_tlist during
make_sort_xxx doesn't work because in most cases we're already committed to
the current input tlist --- it's been marked with sort column numbers, or
we've built grouping column numbers using it, etc.  The tlist has to be
selected properly at the calling level before we start constructing sort-col
information.  This is easy enough to do, we were just failing to take the
point into consideration.)

Back-patch to 8.3.  I believe the problem probably exists clear back to 7.4
when the physical tlist optimization was added, but I'm afraid to back-patch
further than 8.3 without a great deal more study than I want to put into it.
The code in this area has drifted a lot over time.  The real-world importance
of these code paths is uncertain anyway --- I think in many cases we'd
probably prefer hash-based methods.

25e46a50

14 4月, 2008 2 次提交

Since createplan.c no longer cares whether index operators are lossy, it has · 226837e5

由 Tom Lane 提交于 4月 13, 2008

no particular need to do get_op_opfamily_properties() while building an
indexscan plan. Postpone that lookup until executor start. This simplifies
createplan.c a lot more than it complicates nodeIndexscan.c, and makes things
more uniform since we already had to do it that way for RowCompare
expressions. Should be a bit faster too, at least for plans that aren't
re-used many times, since we avoid palloc'ing and perhaps copying the
intermediate list data structure.

226837e5

Phase 2 of project to make index operator lossiness be determined at runtime · 24558da1

由 Tom Lane 提交于 4月 13, 2008

instead of plan time. Extend the amgettuple API so that the index AM returns
a boolean indicating whether the indexquals need to be rechecked, and make
that rechecking happen in nodeIndexscan.c (currently the only place where
it's expected to be needed; other callers of index_getnext are just erroring
out for now). For the moment, GIN and GIST have stub logic that just always
sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing
that control down to the opclass consistent() functions. The planner no
longer pays any attention to amopreqcheck, and that catalog column will go
away in due course.

24558da1

01 4月, 2008 2 次提交

Fix an oversight I made in a cleanup patch over a year ago: · 6b73d7e5

由 Tom Lane 提交于 4月 01, 2008

eval_const_expressions needs to be passed the PlannerInfo ("root") structure,
because in some cases we want it to substitute values for Param nodes.
(So "constant" is not so constant as all that ...)  This mistake partially
disabled optimization of unnamed extended-Query statements in 8.3: in
particular the LIKE-to-indexscan optimization would never be applied if the
LIKE pattern was passed as a parameter, and constraint exclusion depending
on a parameter value didn't work either.

6b73d7e5

T
Apply my original fix for Taiki Yamaguchi's bug report about DISTINCT MAX(). · d3441155
由 Tom Lane 提交于 3月 31, 2008
```
Add some regression tests for plausible failures in this area.
```
d3441155

29 3月, 2008 1 次提交
- T
  Revert my erroneous fix for Taiki Yamaguchi's DISTINCT MAX() bug. · 6fc9d427
  由 Tom Lane 提交于 3月 29, 2008
```
Whatever we do about that, this isn't the path to the solution.
```
  6fc9d427
28 3月, 2008 2 次提交

Department of second thoughts: the rule that ORDER BY and DISTINCT are · 2e4094da

由 Tom Lane 提交于 3月 28, 2008

useless for an ungrouped-aggregate query holds regardless of whether
optimize_minmax_aggregates succeeds.  So we might as well apply the
optimization in any case.

I'll leave 8.3 as it was, since this version is a tad more invasive
than my earlier patch.

2e4094da

When we have successfully optimized a MIN or MAX aggregate into an indexscan, · ff72280c

由 Tom Lane 提交于 3月 27, 2008

the query result must be exactly one row (since we don't do this when there's
any GROUP BY). Therefore any ORDER BY or DISTINCT attached to the query is
useless and can be dropped. Aside from saving useless cycles, this protects
us against problems with matching the hacked-up tlist entries to sort clauses,
as seen in a bug report from Taiki Yamaguchi. We might need to work harder
if we ever try to optimize grouped queries with this approach, but this
solution will do for now.

ff72280c

21 3月, 2008 2 次提交
- B
  
  More README src cleanups. · fca9fff4
  由 Bruce Momjian 提交于 3月 21, 2008
  
  fca9fff4
- B
  
  Make source code READMEs more consistent. Add CVS tags to all README files. · 4e228447
  由 Bruce Momjian 提交于 3月 20, 2008
  
  4e228447
19 3月, 2008 1 次提交

Arrange to "inline" SQL functions that appear in a query's FROM clause, · 0d49838d

由 Tom Lane 提交于 3月 18, 2008

are declared to return set, and consist of just a single SELECT.  We
can replace the FROM-item with a sub-SELECT and then optimize much as
if we were dealing with a view.  Patch from Richard Rowell, cleaned up
by me.

0d49838d

19 2月, 2008 1 次提交
- P
  
  Refactor backend makefiles to remove lots of duplicate code · 0474dcb6
  由 Peter Eisentraut 提交于 2月 19, 2008
  
  0474dcb6
18 1月, 2008 1 次提交

Fix subselect.c to avoid assuming that a SubLink's testexpr references each · a44174cf

由 Tom Lane 提交于 1月 17, 2008

subquery output column exactly once left-to-right. Although this is the case
in the original parser output, it might not be so after rewriting and
constant-folding, as illustrated by bug #3882 from Jan Mate. Instead
scan the subquery's target list to obtain needed per-column information;
this is duplicative of what the parser did, but only a couple dozen lines
need be copied, and we can clean up a couple of notational uglinesses.
Bug was introduced in 8.2 as part of revision of SubLink representation.

a44174cf

11 1月, 2008 1 次提交

Fix a conceptual error in my patch of 2007-10-26 that avoided considering · 59fc64ac

由 Tom Lane 提交于 1月 11, 2008

clauseless joins of relations that have unexploited join clauses. Rather
than looking at every other base relation in the query, the correct thing is
to examine the other relations in the "initial_rels" list of the current
make_rel_from_joinlist() invocation, because those are what we actually have
the ability to join against. This might be a subset of the whole query in
cases where join_collapse_limit or from_collapse_limit or full joins have
prevented merging the whole query into a single join problem. This is a bit
untidy because we have to pass those rels down through a new PlannerInfo
field, but it's necessary. Per bug #3865 from Oleg Kharin.

59fc64ac

10 1月, 2008 1 次提交

Fix some planner issues found while investigating Kevin Grittner's report · 6a652252

由 Tom Lane 提交于 1月 09, 2008

of poorer planning in 8.3 than 8.2:

1. After pushing a constant across an outer join --- ie, given
"a LEFT JOIN b ON (a.x = b.y) WHERE a.x = 42", we can deduce that b.y is
sort of equal to 42, in the sense that we needn't fetch any b rows where
it isn't 42 --- loop to see if any additional deductions can be made.
Previous releases did that by recursing, but I had mistakenly thought that
this was no longer necessary given the EquivalenceClass machinery.

2. Allow pushing constants across outer join conditions even if the
condition is outerjoin_delayed due to a lower outer join. This is safe
as long as the condition is strict and we re-test it at the upper join.

3. Keep the outer-join clause even if we successfully push a constant
across it. This is *necessary* in the outerjoin_delayed case, but
even in the simple case, it seems better to do this to ensure that the
join search order heuristics will consider the join as reasonable to
make. Mark such a clause as having selectivity 1.0, though, since it's
not going to eliminate very many rows after application of the constant
condition.

4. Tweak have_relevant_eclass_joinclause to report that two relations
are joinable when they have vars that are equated to the same constant.
We won't actually generate any joinclause from such an EquivalenceClass,
but again it seems that in such a case it's a good idea to consider
the join as worth costing out.

5. Fix a bug in select_mergejoin_clauses that was exposed by these
changes: we have to reject candidate mergejoin clauses if either side was
equated to a constant, because we can't construct a canonical pathkey list
for such a clause. This is an implementation restriction that might be
worth fixing someday, but it doesn't seem critical to get it done for 8.3.

6a652252

02 1月, 2008 1 次提交
- B
  
  Update copyrights in source tree to 2008. · 9098ab9e
  由 Bruce Momjian 提交于 1月 01, 2008
  
  9098ab9e
04 12月, 2007 1 次提交

Fix build_minmax_path() to cope if an IS NULL clause turns up in the · f538329f

由 Tom Lane 提交于 12月 03, 2007

indexable-clauses list for a btree index.  Formerly it just Asserted that
all such clauses were opclauses, but that's no longer true in 8.3.
Per bug #3796 from Matthias Schoeneich.

f538329f

24 11月, 2007 1 次提交

Change fix_scan_expr() to avoid copying the input node tree in the common case · a36436ea

由 Tom Lane 提交于 11月 24, 2007

where rtoffset == 0. In that case there is no need to change Var nodes,
and since filling in unset opfuncid fields is always safe, scribbling on the
input tree to that extent is not objectionable. This brings the cost of this
operation back down to what it was in 8.2 for simple queries. Per
investigation of performance gripe from Guillaume Smet.

a36436ea

16 11月, 2007 2 次提交
- B
  Re-run pgindent with updated list of typedefs. (Updated README should · f6e8730d
  由 Bruce Momjian 提交于 11月 15, 2007
```
avoid this problem in the future.)
```
  f6e8730d
- B
  
  pgindent run for 8.3. · fdf5a5ef
  由 Bruce Momjian 提交于 11月 15, 2007
  
  fdf5a5ef
09 11月, 2007 2 次提交

Fix EquivalenceClass code to handle volatile sort expressions in a more · c291203c

由 Tom Lane 提交于 11月 08, 2007

predictable manner; in particular that if you say ORDER BY output-column-ref,
it will in fact sort by that specific column even if there are multiple
syntactic matches.  An example is
	SELECT random() AS a, random() AS b FROM ... ORDER BY b, a;
While the use-case for this might be a bit debatable, it worked as expected
in earlier releases, so we should preserve the behavior for 8.3.  Per my
recent proposal.

While at it, fix convert_subquery_pathkeys() to handle RelabelType stripping
in both directions; it needs this for the same reasons make_sort_from_pathkeys
does.

c291203c

Last week's patch for make_sort_from_pathkeys wasn't good enough: it has · 1be06016

由 Tom Lane 提交于 11月 08, 2007

to be able to discard top-level RelabelType nodes on *both* sides of the
equivalence-class-to-target-list comparison, since make_pathkey_from_sortinfo
might either add or remove a RelabelType. Also fix the latter to do the
removal case cleanly. Per example from Peter.

1be06016

03 11月, 2007 1 次提交

Ensure that EquivalenceClasses generated from ORDER BY keys contain proper · 97ddfc96

由 Tom Lane 提交于 11月 02, 2007

RelabelType nodes when the sort key is binary-compatible with the sort
operator rather than having exactly its input type. We did this correctly
for index columns but not sort keys, leading to failure to notice that
a varchar index matches an ORDER BY request. This requires a bit more work
in make_sort_from_pathkeys, but not anyplace else that I can find.
Per bug report and subsequent discussion.

97ddfc96

25 10月, 2007 1 次提交

Fix an error in make_outerjoininfo introduced by my patch of 30-Aug: the code · 3ef18797

由 Tom Lane 提交于 10月 24, 2007

neglected to test whether an outer join's join-condition actually refers to
the lower outer join it is looking at. (The comment correctly described what
was supposed to happen, but the code didn't do it...) This often resulted in
adding an unnecessary constraint on the join order of the two outer joins,
which was bad enough. However, it also seems to expose a performance
problem in an older patch (from 15-Feb): once we've decided that there is a
join ordering constraint, we will start trying clauseless joins between every
combination of rels within the constraint, which pointlessly eats up lots of
time and space if there are numerous rels below the outer join. That probably
needs to be revisited :-(. Per gripe from Jakub Ouhrabka.

3ef18797

13 10月, 2007 1 次提交

Teach planagg.c that partial indexes specifying WHERE foo IS NOT NULL can be · 106264ca

由 Tom Lane 提交于 10月 13, 2007

used to perform MIN(foo) or MAX(foo), since we want to discard null rows in
the indexscan anyway. (This would probably fall out for free if we were
injecting the IS NOT NULL clause somewhere earlier, but given the current
anatomy of the MIN/MAX optimization code we have to do it explicitly.
Fortunately, very little added code is needed.) Per a discussion with
Henk de Wit.

106264ca

12 10月, 2007 1 次提交

Fix the plan-invalidation mechanism to treat regclass constants that refer to · 82d8ab6f

由 Tom Lane 提交于 10月 11, 2007

a relation as a reason to invalidate a plan when the relation changes. This
handles scenarios such as dropping/recreating a sequence that is referenced by
nextval('seq') in a cached plan. Rather than teach plancache.c all about
digging through plan trees to find regclass Consts, we charge the planner's
setrefs.c with making a list of the relation OIDs on which each plan depends.
That way the list can be built cheaply during a plan tree traversal that has
to happen anyway. Per bug #3662 and subsequent discussion.

82d8ab6f

05 10月, 2007 1 次提交

Keep the planner from failing on "WHERE false AND something IN (SELECT ...)". · 89db887b

由 Tom Lane 提交于 10月 04, 2007

eval_const_expressions simplifies this to just "WHERE false", but we have
already done pull_up_IN_clauses so the IN join will be done, or at least
planned, anyway. The trouble case comes when the sub-SELECT is itself a join
and we decide to implement the IN by unique-ifying the sub-SELECT outputs:
with no remaining reference to the output Vars in WHERE, we won't have
propagated the Vars up to the upper join point, leading to "variable not found
in subplan target lists" error. Fix by adding an extra scan of in_info_list
and forcing all Vars mentioned therein to be propagated up to the IN join
point. Per bug report from Miroslav Sulc.

89db887b

23 9月, 2007 1 次提交

Fix cost estimates for EXISTS subqueries that are evaluated as initPlans · 71256875

由 Tom Lane 提交于 9月 22, 2007

(because they are uncorrelated with the immediate parent query).  We were
charging the full run cost to the parent node, disregarding the fact that
only one row need be fetched for EXISTS.  While this would only be a
cosmetic issue in most cases, it might possibly affect planning outcomes
if the parent query were itself a subquery to some upper query.
Per recent discussion with Steve Crawford.

71256875