提交 · 669dd2794f8d32e0f622e9481c4a5ee297c94f72 · Greenplum / Gpdb

22 9月, 2017 1 次提交

Enable ORCA to be tracked by Mem Accounting · 669dd279

由 Kavinder Dhaliwal 提交于 9月 21, 2017

Before this commit all memory allocations made by ORCA/GPOS were a
blackbox to GPDB. However the ground work had been in place to allow
GPDB's Memory Accounting Framework to track memory consumption by ORCA.
This commit introduces two new functions
Ext_OptimizerAlloc and Ext_OptimizerFree which
pass through their parameters to gp_malloc and gp_free and do some bookeeping
against the Optimizer Memory Account. This introduces very little
overhead to the GPOS memory management framework.
Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>

669dd279

21 9月, 2017 4 次提交

Fix bug in handling re-scan of a hash join. · f7101d98

由 Heikki Linnakangas 提交于 9月 21, 2017

The WITH RECURSIVE test case in 'join_gp' would miss some rows, if
the hash algorithm (src/backend/access/hash/hashfunc.c) was replaced
with the one from PostgreSQL 8.4, or if statement_mem was lowered from
1000 kB to 700 kB. This is what happened:

1. A tuple belongs to batch 0, and is kept in memory during processing
   batch 0.

2. The outer scan finishes, and we spill the inner batch 0 from memory
   to a file, with SpillFirstBatch, and start processing tuple 1

3. While processing batch 1, the number of batches is increased, and
   the tuple that belonged to batch 0, and was already written to the
   batch 0's file, is moved, to a later batch.

4. After the first scan is complete, the hash join is re-scanned

5. We reload the batch file 0 into memory. While reloading, we encounter
   the tuple that now doesn't seem to belong to batch 0, and throw it
   away.

6. We perform the rest of the re-scan. We have missed any matches to the
   tuple that was thrown away. It was not part of the later batch files,
   because in the first pass, it was handled as part of batch 0. But in
   the re-scan, it was not handled as part of batch 0, because nbatch was
   now larger, so it didn't belong there.

To fix, when reloading a batch file we see a tuple that actually belongs
to a later batch file, we write it to that later file. To avoid adding
it there multiple times, if the hash join is re-scanned multiple times,
if any tuples are moved when reloading a batch file, destroy the batch
file and re-create it with just the remaining tuples.

This is made a bit complicated by the fact that BFZ temp files don't support
appending to a file that's already been rewinded for reading. So what we
actually do, is always re-create the batch file, even if there has been no
changes to it. I left comments about that, Ideally, we would either support
re-appending to BFZ files, or stopped using BFZ workfiles for this
altogether (I'm not convinced they're any better than plain BufFiles). But
that can be done later.

Fixes github issue #3284

f7101d98

Fix CURRENT OF to work with PL/pgSQL cursors. · 91411ac4

由 Heikki Linnakangas 提交于 9月 21, 2017

It only worked for cursors declared with DECLARE CURSOR, before. You got
an "there is no parameter $0" error if you tried. This moves the decision
on whether a plan is "simply updatable", from the parser to the planner.
Doing it in the parser was awkward, because we only want to do it for
queries that are used in a cursor, and for SPI queries, we don't know it
at that time yet.

For some reason, the copy, out, read-functions of CurrentOfExpr were missing
the cursor_param field. While we're at it, reorder the code to match
upstream.

This only makes the required changes to the Postgres planner. ORCA has never
supported updatable cursors. In fact, it will fall back to the Postgres
planner on any DECLARE CURSOR command, so that's why the existing tests
have passed even with optimizer=off.

91411ac4

Make gp_replication.conf for USE_SEGWALREP only. · b7ce6930

由 Ashwin Agrawal 提交于 9月 20, 2017

The intend of this extra configuration file is to control the
synchronization between primary and mirror for WALREP.

The gp_replication.conf is not designed to work with filerep, for
example, the scripts like gp_expand will fail since it directly modify
the configuration files instead of going through initdb.
Signed-off-by: NXin Zhang <xzhang@pivotal.io>

b7ce6930

Add support for CREATE FUNCTION EXECUTE ON [MASTER | ALL SEGMENTS] · aa148d2a

由 Heikki Linnakangas 提交于 9月 20, 2017

We already had a hack for the EXECUTE ON ALL SEGMENTS case, by setting
prodataaccess='s'. This exposes the functionality to users via DDL, and adds
support for the EXECUTE ON MASTER case.

There was discussion on gpdb-dev about also supporting ON MASTER AND ALL
SEGMENTS, but that is not implemented yet. There is no handy "locus" in the
planner to represent that. There was also discussion about making a
gp_segment_id column implicitly available for functions, but that is also
not implemented yet.

The old behavior was that a function that if a function was marked as
IMMUTABLE, it could be executed anywhere. Otherwise it was always executed
on the master. For backwards-compatibility, this keeps that behavior for
EXECUTE ON ANY (the default), so even if a function is marked as EXECUTE ON
ANY, it will always be executed on the master unless it's IMMUTABLE.

There is no support for these new options in ORCA. Using any ON MASTER or
ON ALL SEGMENTS functions in a query cause ORCA to fall back. This is the
same as with the prodataaccess='s' hack that this replaces, but now that it
is more user-visible, it would be nice to teach ORCA about it.

The new options are only supported for set-returning functions, because for
a regular function marked as EXECUTE ON ALL SEGMENTS, it's not clear how
the results should be combined. ON MASTER would probably be doable, but
there's no need for that right now, so punt.

Another restriction is that a function with ON ALL SEGMENTS or ON MASTER can
only be used in the FROM clause, or in the target list of a simple SELECT
with no FROM clause. So "SELECT func()" is accepted, but "SELECT func() FROM
foo" is not. "SELECT * FROM func(), foo" works, however. EXECUTE ON ANY
functions, which is the default, work the same as before.

aa148d2a

19 9月, 2017 4 次提交

Map GPOS severity level to GPDB Severity Levels · e25eba47

由 Bhuvnesh Chaudhary 提交于 9月 13, 2017

GPOS raises exception with different severity level, but
they were being logged to GPDB logs at LOG severity level.
This disabled users to not turn off logging for GPOS exceptions, unless
GPDB log setting was changed higher than LOG severity level.

This is the initial commit which introduces the functionality. If an
exception is created without the GPDB severity level, it will default to
LOG severity level in GPDB.
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

e25eba47

A
Fix: using macro for GP_REPLICATION_CONFIG_FILENAME. · 60db8cfd
由 Abhijit Subramanya 提交于 9月 14, 2017
```
Signed-off-by: NXin Zhang <xzhang@pivotal.io>
```
60db8cfd

Create generic API to set any GUC values in GP_REPLICATION_CONFIG_FILENAME · f39047dd

由 Xin Zhang 提交于 9月 13, 2017

New API: void set_gp_replication_config(const char *name, const char *value)

This function is inspired by the upstream ALTER SYSTEM command
AlterSystemSetConfigFile() from commit
7dfab04a.

Once we merged the upstream changes, we can remove this function and
directly use the AlterSystemSetConfigFile().
Signed-off-by: NAbhijit Subramanya <asubramanya@pivotal.io>

f39047dd

Add GP-specific replication config file gp_replication.conf · a90b4103

由 Xin Zhang 提交于 9月 13, 2017

We use this file to store the GUC value `sychronous_standby_names` in
order to control the blocking behavior between primary and mirrors as
used in upstream. When this GUC is on, the primary is blocked and
waiting for the commits propagated to mirrors regardless of mirror
status. When this GUC is off, primary just archive and won't wait for
mirrors.

gp_replication.conf is now read unconditionally by the GUC parsing logic
and needs to be set up by initdb. Refactor set_null_conf() to take a
filename so that we don't copy-paste more code.
Signed-off-by: NJacob Champion <pchampion@pivotal.io>

a90b4103

18 9月, 2017 1 次提交

Using fault name instead of enum as the key of fault hash table (#3249) · 4616d3ec

由 Huan Zhang 提交于 9月 18, 2017

Using fault name instead of enum as the key of fault hash table

GPDB fault injector uses fault enum as the key of fault hash table.
If someone wants to inject fault into gpdb extensions(a separate repo),
she has to hard code the extension related fault enums into gpdb core
code, this is not a good practice.
So we simply use fault name as the hash key to remove the need of hard
code the fault enum. Note that fault injector API doesn't change.

4616d3ec

17 9月, 2017 2 次提交

Convert WindowFrame to frameOptions + start + end · ebf9763c

由 Heikki Linnakangas 提交于 9月 17, 2017

In GPDB, we have so far used a WindowFrame struct to represent the start
and end window bound, in a ROWS/RANGE BETWEEN clause, while PostgreSQL
uses the combination of  a frameOptions bitmask and start and end
expressions. Refactor to replace the WindowFrame with the upstream
representation.

ebf9763c

Hardcode the "frame maker" function for LEAD and LAG. · 686aab95

由 Heikki Linnakangas 提交于 9月 17, 2017

This removes pg_window.winframemakerfunc column. It was only used for
LEAD/LAG, and only in the Postgres planner. Hardcode the same special
handling for LEAD/LAG in planwindow.c instead, based on winkind.

This is one step in refactoring the planner and executor further, to
replace the GPDB implementation of window functions with the upstream
one.

686aab95

15 9月, 2017 5 次提交

Make it possible to build without libbz2, also on non-Windows. · d6749c3c

由 Heikki Linnakangas 提交于 9月 15, 2017

The bzip2 library is only used by the gfile/fstream code, used for external
tables and gpfdist. The usage of bzip2 was in #ifndef WIN32 blocks, so it
was only built on non-Windows systems.

Instead of tying it to the platform, use a proper autoconf check and
HAVE_LIBBZ2 flags. This makes it possible to build gpfdist with bzip2
support on Windows, as well as building without bzip2 on non-Windows
systems. That makes it easier to test the otherwise Windows-only codepaths
on other platforms. --with-libbz2 is still the default, but you can now use
--without-libbz2 if you wish.

I'm sure that some regression tests will fail if you actually build the
server without libbz2, but I'm not going to address that right now. We have
similar problems with other features that are in principle optional, but
cause some regression tests to fail.

Also use "#ifdef HAVE_LIBZ" rather than "#ifndef WIN32" to enable/disable
zlib support in gpfdist. Building the server still fails if you use
--without-zlib, but at least you can build the client programs without
zlib, also on non-Windows systems.

Remove obsolete copy of bzlib.h from the repository while we're at it.

d6749c3c

Stop supporting SQL type aliases in ALTER TYPE SET DEFAULT ENCODING. · b4f125bd

由 Heikki Linnakangas 提交于 9月 15, 2017

This is a bit unfortunate, in case someone is using them. But as it
happens, we haven't even mentioned the ALTER TYPE SET DEFAULT ENCODING
command in the documentation, so there probably aren't many people using
them, and you can achieve the same thing by using the normal, non-alias,
names like "varchar" instead of "character varying".

b4f125bd

H
Move extended grouping processing to after transforming window functions. · 0b246cec
由 Heikki Linnakangas 提交于 9月 15, 2017
```
This way we don't need the weird half-transformation of WindowDefs. Makes
things simpler.
```
0b246cec

Remove gp_fault_strategy catalog table and corresponding code. · f5b5c218

由 Ashwin Agrawal 提交于 9月 12, 2017

Using gp_segment_configuration catalog table easily can find if mirrors exist or
not, do not need special table to communicate the same. Earlier
gp_fault_strategy used to convey 'n' for mirrorless system, 'f' for replication
and 's' for san mirrors. Since support for 's' was removed in 5.0 only purpose
gp_fault_strategy served was mirrored or not mirrored system. Hence deleting the
gp_fault_strategy table and at required places using gp_segment_configuration to
find the required info.

f5b5c218

Only request stats of columns needed for cardinality estimation [#150424379] · c5ade96d

由 Omer Arap 提交于 8月 29, 2017

GPORCA should not spend time extracting column statistics that are not
needed for cardinality estimation. This commit eliminates this overhead
of requesting and generating the statistics for columns that are not
used in cardinality estimation unnecessarily.

E.g:
`CREATE TABLE foo (a int, b int, c int);`

For table foo, the query below only needs for stats for column `a` which
is the distribution column and column `c` which is the column used in
where clause.
`select * from foo where c=2;`

However, prior to that commit, the column statistics for column `b` is
also calculated and passed for the cardinality estimation. The only
information needed by the optimizer is the `width` of column `b`. For
this tiny information, we transfer every stats information for that
column.

This commit and its counterpart commit in GPORCA ensures that the column
width information is passed and extracted in the `dxl:Relation` metadata
information.

Preliminary results for short running queries provides up to 65x
performance improvement.
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

c5ade96d

14 9月, 2017 3 次提交

Remove unused ENABLE_LTRACE code. · d994b38e

由 Heikki Linnakangas 提交于 9月 14, 2017

Although I'm not too familiar with SystemTap, I'm pretty sure that recent
versions can do user space tracing better. I don't think anyone is using
these hacks anymore, so remove them.

d994b38e

Refactor resource group source code. · d145bd11

由 Ning Yu 提交于 9月 14, 2017

* resgroup: move MyResGroupSharedInfo into MyResGroupProcInfo.

MyResGroupSharedInfo is now replaced with MyResGroupProcInfo->group.

* resgroup: retire resGranted in PGPROC.

when resGranted == false we must have resSlotId == InvalidSlotId,
when resGranted != false we must have resSlotId != InvalidSlotId,
so we can retire resGranted and keep only resSlotId.

* resgroup: rename sharedInfo to group.

in resgroup.c there used to be both `group` and `sharedInfo` for the
same thing, now only use `group`.

* resgroup: rename MyResGroupProcInfo to self.

We want to use this variable directly so a short name is better.

d145bd11

Use built-in JSON parser for PXF fragments (#3185) · 9f4497fd

由 Daniel Gustafsson 提交于 9月 14, 2017

* Use built-in JSON parser for PXF fragments

Instead of relying on a external library, use the built-in JSON
parser in the backend for the PXF fragments parsing. Since this
replaces the current implementation with an event-based callback
parser, the code is more complicated, but dogfooding the parser
that we want extension writers to use is a good thing.

This removes the dependency on json-c from autoconf, and enables
building PXF on Travis for extra coverage.

* Use elog for internal errors, and ereport for user errors

Internal errors where we are interested in source filename should
use elog() which will decorate the error messages automatically
with this information. The connection error is interesting for the
user however, use ereport() instead there.

9f4497fd

13 9月, 2017 1 次提交

Add support for piping COPY to/from an external program. · c415415a

由 Adam Lee 提交于 9月 01, 2017

commit 3d009e45
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed Feb 27 18:17:21 2013 +0200

Add support for piping COPY to/from an external program.

This includes backend "COPY TO/FROM PROGRAM '...'" syntax, and corresponding
psql \copy syntax. Like with reading/writing files, the backend version is
superuser-only, and in the psql version, the program is run in the client.

In the passing, the psql \copy STDIN/STDOUT syntax is subtly changed: if you
the stdin/stdout is quoted, it's now interpreted as a filename. For example,
"\copy foo from 'stdin'" now reads from a file called 'stdin', not from
standard input. Before this, there was no way to specify a filename called
stdin, stdout, pstdin or pstdout.

This creates a new function in pgport, wait_result_to_str(), which can
be used to convert the exit status of a process, as returned by wait(3),
to a human-readable string.

Etsuro Fujita, reviewed by Amit Kapila.
Signed-off-by: NAdam Lee <ali@pivotal.io>
Signed-off-by: NMing LI <mli@apache.org>

c415415a

12 9月, 2017 2 次提交

Fix wrong results for NOT-EXISTS sublinks with aggs & LIMIT · 087a7175

由 Shreedhar Hardikar 提交于 9月 08, 2017

During NOT EXISTS sublink pullup, we create a one-time false filter when
the sublink contains aggregates without checking for limitcount. However
in situations where the sublink contains an aggregate with limit 0, we
should not generate such filter as it produces incorrect results.

Added regress test.

Also, initialize all the members of IncrementVarSublevelsUp_context
properly.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

087a7175

Split WindowSpec into separate before and after parse-analysis structs. · 789f443d

由 Heikki Linnakangas 提交于 9月 12, 2017

In the upstream, two different structs are used to represent a window
definition. WindowDef in the grammar, which is transformed into
WindowClause during parse analysis. In GPDB, we've been using the same
struct, WindowSpec, in both stages. Split it up, to match the upstream.

The representation of the window frame, i.e. "ROWS/RANGE BETWEEN ..." was
different between the upstream implementation and the GPDB one. We now use
the upstream frameOptions+startOffset+endOffset representation in raw
WindowDef parse node, but it's still converted to the WindowFrame
representation for the later stages, so WindowClause still uses that. I
will switch over the rest of the codebase to the upstream representation as
a separate patch.

Also, refactor WINDOW clause deparsing to be closer to upstream.

One notable difference is that the old WindowSpec.winspec field corresponds
to the winref field in WindowDef andWindowClause, except that the new
'winref' is 1-based, while the old field was 0-based.

Another noteworthy thing is that this forbids specifying "OVER (w
ROWS/RANGE BETWEEN ...", if the window "w" already specified a window frame,
i.e. a different ROWS/RANGE BETWEEN. There was one such case in the
regression suite, in window_views, and this updates the expected output of
that to be an error.

789f443d

11 9月, 2017 1 次提交

Run autoheader. · d2d19823

由 Heikki Linnakangas 提交于 9月 11, 2017

pg_config.h.in is generated by autoheader, but it had fallen out of date.
Has everyone just been adding stuff manually to it, or how did this happen?
In any case, let's run it now.

d2d19823

09 9月, 2017 1 次提交

Dump COUNT(*) OVER (...) correctly, with the star. · 387c485d

由 Heikki Linnakangas 提交于 9月 06, 2017

This adds the 'winstar' field from the upstream. Also bring in the 'winagg'
field while we're at it, although it's only used for an assertion in
nodeWindow.c so far.

387c485d

08 9月, 2017 4 次提交

Refactor window function syntax checks to match upstream. · f819890b

由 Heikki Linnakangas 提交于 9月 08, 2017

Mostly, move the responsibilities of the check_call() function to the
callers, transformAggregateCall() and transformWindowFuncCall().

This fixes one long-standing, albeit harmless, bug. Previously, you got an
"Unexpected internal error", if you tried to use a window function in the
WHERE clause of a DELETE statement, instead of a user-friendly syntax
error. Add a test case for that.

Move a few similar tests from 'olap_window_seq' to 'qp_olap_windowerr'.
Seems like a more appropriate place for them. Also, 'olap_window_seq' has
an alternative expected output file for ORCA, so it's nice to keep tests
that produce the same output with or without ORCA out of there. Also add a
test query for creating an index on an expression containing a window
function. There was a test for that already, but it was missing parens
around the expression, and therefore produced an error already in the
grammar.

f819890b

Make DatumGetPointer() and PointerGetDatum() macros. · 9c8e0816

由 Ashwin Agrawal 提交于 9月 06, 2017

These functions as inline functions were producing warnings, based on discussion
https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/6fgKvN9QpV4/zjysjqIZAgAJ
converting them to macro as upstream. Adding explicit type casting wherever
needed now that DatumGetPointer() returns (char *) instead of (void *).

9c8e0816

P
Mark DateTimeParseError() noreturn · d38b79c1
由 Peter Eisentraut 提交于 8月 21, 2012
```
This avoids a warning from clang 3.2 about an uninitialized variable
'dtype' in date_in().
```
d38b79c1

Remove dead code hasModifyingCTE · ed0e1830

由 Haisheng Yuan and Jesse Zhang 提交于 9月 06, 2017

This commit removes Query::hasModifyingCTE and
ParseState::p_hasModifyingCTE because they are dead code.

This change impacts reading and writing `pg_rewrite` rules, which is how
views are implemented, and hence won't be backported to 5.0 or ealier. A
`pg_upgrade` from 5 to 6 will still work because this change has no DDL
surface.

ed0e1830

07 9月, 2017 9 次提交

Decide whether a window edge is "delayed" later, in the executor. · 23b262f2

由 Heikki Linnakangas 提交于 9月 07, 2017

In principle, it makes sense to determine at plan-time, whether the
expression needs to be re-evaluated for every row. In practice, it seems
simpler to decide that in the executor, when initializing the Window node.
This allows removing a bunch of code from the planner, and from the ORCA
translator, including the hack to force the expression to be delayed if it
was a SubLink.

The planner always set the delayed flag, unless the expression was a Const.
We can easily and quickly check for that in the executor too. I'm not sure
how ORCA decided whether to delay or not, but in some quick testing I
cannot come up with a case where it would decide differently.

23b262f2

Refactor code that selects a common type for columns in a UNION query. · 352362a6

由 Heikki Linnakangas 提交于 9月 07, 2017

The big difference is that each leaf query is now transformed in one go,
like it's done in the upstream, instead of transforming the target list
and FROM list first. That partial transformation was causing trouble for
another refactoring that I'm working on, which ill change the way window
functions are handled in parse analysis.

This two-pass code is GPDB-specific, PostgreSQL uses a simpler algorithm
that works bottom-up, one setop node at a time, to select the column types.

352362a6

Remove remnants of "EXCLUDE [CURRENT ROW|GROUP|TIES|NO OTHERS]" syntax. · 646cdc60

由 Heikki Linnakangas 提交于 9月 07, 2017

It hasn't been implemented, but there is basic support in the grammar,
just enough to detect the syntax and throw an error or ignore it. All the
rest was dead code.

646cdc60

Enable ORCA to use IndexScan on Leaf Partitions · dae6849f

由 Dhanashree Kashid, Ekta Khanna and Shreedhar Hardikar 提交于 8月 28, 2017

Currently ORCA does not support index scan on leaf partitions. It only supports
index scan if we query the root table. This commit along with the corresponding
ORCA changes adds a support for using indexes when leaf partitions are queried
directly.

When a root table that has indexes (either homogenous/complete or
heterogenous/partial) is queried; the Relcache Translator sends index
information to ORCA.  This enables ORCA to generate an alternative plan with
Dynamic Index Scan on all partitions (in case of homogenous index) or a plan
with partial scan i.e. Dynamic Table Scan on leaf partitions that  don’t have
indexes + Dynamic Index Scan on leaf partitions with indexes (in case of
heterogeneous index).

This is a two step process in Relcache Translator as described below:

Step 1 - Get list of all index oids

`CTranslatorRelcacheToDXL::PdrgpmdidRelIndexes()` performs this step and it
only retrieves indexes on root and regular tables; for leaf partitions it bails
out.

Now for root, list of index oids is nothing but index oids on its leaf
partitions. For instance:

```
CREATE TABLE foo ( a int, b int, c int, d int) DISTRIBUTED by (a) PARTITION
BY RANGE(b) (PARTITION p1 START (1) END (10) INCLUSIVE, PARTITION p2 START (11)
END (20) INCLUSIVE);

CREATE INDEX complete_c on foo USING btree (c); CREATE INDEX partial_d on
foo_1_prt_p2 using btree(d);
```
The index list will look like = { complete_c_1_prt_p1, partial_d }

For a complete index, the index oid of the first leaf partitions is retrieved.
If there are partial indexes, all the partial index oids are retrieved.

Step 2 - Construct Index Metadata object

`CTranslatorRelcacheToDXL::Pmdindex()` performs this step.

For each index oid retrieved in Step #1 above; construct an Index Metadata
object (CMDIndexGPDB) to be stored in metadata cache such that ORCA can get all
the information about the index.
Along with all other information about the index, `CMDIndexGPDB` also contains
a flag `fPartial` which denotes if the given index is homogenous (if yes, ORCA
will apply it to all partitions selected by partition selector) or heterogenous
(if yes, the index will be applied to only appropriate partitions).
The process is as follows:
```
	Foreach oid in index oid list :
		Get index relation (rel)
		If rel is a leaf partition :
			Get the root rel of the leaf partition
			Get all	the indexes on the root (this will be same list as step #1)
			Determine if the current index oid is homogenous or heterogenous
			Construct CMDIndexGPDB based appropriately (with fPartial, part constraint,
			defaultlevels info)
		Else:
			Construct a normal CMDIndexGPDB object.
```

Now for leaf partitions, there is no notion of homogenous or heterogenous
indexes since a leaf partition is like a regular table. Hence in `Pmdindex()`
we should not got for checking if index is complete or not.

Additionally, If a given index is homogenous or heterogenous needs to be
decided from the perspective of relation we are querying(such as root or a
leaf).

Hence the right place of `fPartial` flag is in the relation metadata object
(CMDRelationGPDB) and not the independent Index metadata object (CMDIndexGPDB).
This commit makes following changes to support index scan on leaf partitions
along with partial scans :

Relcache Translator:

In Step1, retrieve the index information on the leaf partition and create a
list of CMDIndexInfo object which contain the index oid and `fPartial` flag.
Step 1 is the place where we know what relation we are querying which enable us
to determine whether or not the index is homogenous from the context of the
relation.

The relation metadata tag will look like following after this change:

Before:
```
	<dxl:Indexes>
		<dxl:Index Mdid="0.17159874.1.0"/>
		<dxl:Index Mdid="0.17159920.1.0"/>
	</dxl:Indexes>
```

After:
```
	<dxl:IndexInfoList>
		<dxl:IndexInfo Mdid="0.17159874.1.0" IsPartial="true"/>
		<dxl:IndexInfo Mdid="0.17159920.1.0" IsPartial="false"/>
	</dxl:IndexInfoList>

```

A new class `CMDIndexInfo` has been created in ORCA which contains index mdid
and `fPartial` flag.  For external tables, normal tables and leaf partitions;
the `fPartial` flag will always be false.

Hence at the end, relcache translator will provide list of indexes defined on
leaf partitions when they are queried directly with `fPartial` being false
always. And when root table is queried; the `fPartial` will be set
appropriately based on the completeness of the index.  ORCA will refer to
Relation Metadata for fPartial information and not to the indepedent Index
Metadata Object.

[Ref ##120303669]

dae6849f

Un-hide recursive CTE on master [#150861534] · 20152cbf

由 Jesse Zhang 提交于 9月 06, 2017

We will be less conservative and enable by default recursive CTE on
master, while keeping recursive CTE hidden as we progress on developing
the feature.

This reverts the following two commits:
* 280c577a "Set gp_recursive_cte_prototype GUC to true in test"
* 4d5f8087 "Guard Recursive CTE behind a GUC"
Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>

20152cbf

Guard Recursive CTE behind a GUC · 4d5f8087

由 Kavinder Dhaliwal 提交于 8月 18, 2017

While Recurisve CTE is still being developed it will be hidden from users by
the guc gp_recursive_cte_prototype
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>

4d5f8087

Fix up ruleutils.c for CTE features. The main problem was that · 8e4b2f67

由 Tom Lane 提交于 10月 06, 2008

get_name_for_var_field didn't have enough context to interpret a reference to
a CTE query's output.  Fixing this requires separate hacks for the regular
deparse case (pg_get_ruledef) and for the EXPLAIN case, since the available
context information is quite different.  It's pretty nearly parallel to the
existing code for SUBQUERY RTEs, though.  Also, add code to make sure we
qualify a relation name that matches a CTE name; else the CTE will mistakenly
capture the reference when reloading the rule.

In passing, fix a pre-existing problem with get_name_for_var_field not working
on variables in targetlists of SubqueryScan plan nodes.  Although latent all
along, this wasn't a problem until we made EXPLAIN VERBOSE try to print
targetlists.  To do this, refactor the deparse_context_for_plan API so that
the special case for SubqueryScan is all on ruleutils.c's side.

(cherry picked from commit 742fd06d)

8e4b2f67

Supporting ReScan of HashJoin with Spilled HashTable (#2770) · 391e9ea7

由 foyzur 提交于 7月 28, 2017

To support RecursiveCTE we need to be able to ReScan a HashJoin as many times as the recursion depth. The HashJoin was previously ReScannable only if it has one memory-resident batch. Now, we support ReScannability for more than one batch. The approach that we took is to keep the inner batch files around for more than the duration of a single iteration of join if we detect that we need to reuse the batch files for rescanning. This can also improve the performance of the subplan as we no longer need to materialize and rebuild the hash table. Rather, we can just reload the batches from their corresponding batch files.

To accomplish reloading of inner batch files, we keep the inner batch files around even if the outer is joined as we wait for the reuse in subsequent rescan (if rescannability is desired).

The corresponding mail thread is here: https://groups.google.com/a/greenplum.org/forum/#!searchin/gpdb-dev/Rescannability$20of$20HashJoin%7Csort:relevance/gpdb-dev/E5kYU0FwJLg/Cqcxx0fOCQAJ

Contributed by Haisheng Yuan, Kavinder Dhaliwal and Foyzur Rahman

391e9ea7

Bring in recursive CTE to GPDB · fd61a4ca

由 Haisheng Yuan 提交于 6月 13, 2017

Planner generates plan that doesn't insert any motion between WorkTableScan and
its corresponding RecursiveUnion, because currently in GPDB motions are not
rescannable. For example, a MPP plan for recursive CTE query may look like:
```
Gather Motion 3:1
   ->  Recursive Union
         ->  Seq Scan on department
               Filter: name = 'A'::text
         ->  Nested Loop
               Join Filter: d.parent_department = sd.id
               ->  WorkTable Scan on subdepartment sd
               ->  Materialize
                     ->  Broadcast Motion 3:3
                           ->  Seq Scan on department d
```

For the current solution, the WorkTableScan is always put on the outer side of
the top most Join (the recursive part of RecusiveUnion), so that we can safely
rescan the inner child of join without worrying about the materialization of a
potential underlying motion. This is a heuristic based plan, not a cost based
plan.

Ideally, the WorkTableScan can be placed on either side of the join with any
depth, and the plan should be chosen based on the cost of the recursive plan
and the number of recursions. But we will leave it for later work.

Note: The hash join is temporarily disabled for plan generation of recursive
part, because if the hash table spills, the batch file is going to be removed
as it executes. We have a following story to enable spilled hash table to be
rescannable.

See discussion at gpdb-dev mailing list:
https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/s_SoXKlwd6I

fd61a4ca

06 9月, 2017 2 次提交

Ensure that stable functions in a prepared statement are re-evaluated. · ccca0af2

由 Heikki Linnakangas 提交于 9月 06, 2017

If a prepared statement, or a cached plan for an SPI query e.g. from a
PL/pgSQL function, contains stable functions, the stable functions were
incorrectly evaluated only once at plan time, instead of on every execution
of the plan. This happened to not be a problem in queries that contain any
parameters, because in GPDB, they are re-planned on every invocation
anyway, but non-parameter queries were broken.

In the planner, before this commit, when simplifying expressions, we set
the transform_stable_funcs flag to true for every query, and evaluated all
stable functions at planning time. Change it to false, and also rename it
back to 'estimate', as it's called in the upstream. That flag was changed
back in 2010, in order to allow partition pruning to work with qual
containing stable functions, like TO_DATE. I think back then, we always
re-planned every query, so that was OK, but we do cache plans now.

To avoid regressing to worse plans, change eval_const_expressions() so that
it still does evaluate stable functions, even when the 'estimate' flag is
off. But when it does so, mark the plan as "one-off", meaning that it must
be re-planned on every execution. That gives the old, intended, behavior,
that such plans are indeed re-planned, but it still allows plans that don't
use stable functions to be cached.

This seems to fix github issue #2661. Looking at the direct dispatch code
in apply_motion(), I suspect there are more issues like this lurking there.
There's a call to planner_make_plan_constant(), modifying the target list
in place, and that happens during planning. But this at least fixes the
non-direct dispatch cases, and is a necessary step for fixing any remaining
issues.

For some reason, the query now gets planned *twice* for every invocation.
That's not ideal, but it was an existing issue for prepared statements with
parameters, already. So let's deal with that separately.

ccca0af2

Refactor the way seqserver host and port are stored. · 208a3cad

由 Heikki Linnakangas 提交于 9月 05, 2017

They're not really per-portal settings, so it doesn't make much sense
to pass them to PortalStart. And most of the callers were passing
savedSeqServerHost/Port anyway. Instead, set the "current" host and port
in postgres.c, when we receive them from the QD.

208a3cad