- 15 10月, 2020 1 次提交
-
-
由 Shreedhar Hardikar 提交于
gpfdist uses the global xid & timestamp to distinguish whether each connection belongs to the same external scan or not. ORCA generates a unique scan number for each ExternalScan within the same plan, but not accross plans. So, within a transaction, we may issue multiple external scans that do not get differentiated properly, producing different results. This commit patches that by using a different scan number accross plans, just like what planner does. Ideally gpfdist should also take into account the command-id of the query to prevent this problem for other cases such as prepared statements.
-
- 23 9月, 2020 1 次提交
-
-
由 Jesse Zhang 提交于
The canonical config file is in src/backend/gporca/.clang-format, I've created two symlinks, one for GPOPT headers, one for GPOPT. This is spiritually a cherry-pick of commit 2f7dd76c, but with the actual code of this branch (6X_STABLE) formatted, of course. (cherry picked from commit 2f7dd76c)
-
- 30 7月, 2020 1 次提交
-
-
由 David Kimura 提交于
This commit allows Orca to select plans that leverage IndexOnlyScan node. A new GUC 'optimizer_enable_indexonlyscan' is used to enable or disable this feature. Index only scan is disabled by default, until the following issues are addressed: 1) Implement cost comparison model for index only scans. Currently, cost is hard coded for testing purposes. 2) Support index only scan using GiST and SP-GiST as allowed. Currently, code only supports index only scans on b-tree index. Co-authored-by: NChris Hajas <chajas@vmware.com> (cherry picked from commit 3b72df18)
-
- 20 11月, 2019 1 次提交
-
-
由 Hans Zeller 提交于
The corresponding ORCA PR is https://github.com/greenplum-db/gporca/pull/554. Change the check when translating an ORCA query to a plan. The old check prohibited ArrayCmp on a btree index. The new check is similar, except that it allows an ArrayCmp on a btree index when it is done in a bitmap index probe. Updated ICG result files and added a new test case.
-
- 01 6月, 2019 1 次提交
-
-
由 Chris Hajas 提交于
The IMemoryPool interface was removed in ORCA to remove an unnecessary abstraction layer and avoid costly casting. Corresponding ORCA commit: e64a2b42 Bumps ORCA version to 3.46.0 Authored-by: NChris Hajas <chajas@pivotal.io>
-
- 01 2月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
Replace the use of the built-in hashing support for built-in datatypes, in cdbhash.c, with the normal PostgreSQL hash functions. Now is a good time to do this, since we've already made the change to use jump consistent hashing in GPDB 6, so we'll need to deal with the upgrade problems associated with changing the hash functions, anyway. It is no longer enough to track which columns/expressions are used to distribute data. You also need to know the hash function used. For that, a new field is added to gp_distribution_policy, to record the hash operator class used for each distribution key column. In the planner, a new opfamily field is added to DistributionKey, to track that throughout the planning. Normally, if you do "CREATE TABLE ... DISTRIBUTED BY (column)", the default hash operator class for the datatype is used. But this patch extends the syntax so that you can specify the operator class explicitly, like "... DISTRIBUTED BY (column opclass)". This is similar to how an operator class can be specified for each column in CREATE INDEX. To support upgrade, the old hash functions have been converted to special (non-default) operator classes, named cdbhash_*_ops. For example, if you want to use the old hash function for an integer column, you could do "DISTRIBUTED BY (intcol cdbhash_int4_ops)". The old hard-coded whitelist of operators that have "compatible" cdbhash functions has been replaced by putting the compatible hash opclasses in the same operator family. For example, all legacy integer operator classes, cdbhash_int2_ops, cdbhash_int4_ops and cdbhash_int8_ops, are all part of the cdbhash_integer_ops operator family). This removes the pg_database.hashmethod field. The hash method is now tracked on a per-table and per-column basis, using the opclasses, so it's not needed anymore. To help with upgrade from GPDB 5, this introduces a new GUC called 'gp_use_legacy_hashops'. If it's set, CREATE TABLE uses the legacy hash opclasses, instead of the default hash opclasses, if the opclass is not specified explicitly. pg_upgrade will set the new GUC, to force the use of legacy hashops, when restoring the schema dump. It will also set the GUC on all upgraded databases, as a per-database option, so any new tables created after upgrade will also use the legacy opclasses. It seems better to be consistent after upgrade, so that collocation between old and new tables work for example. The idea is that some time after the upgrade, the admin can reorganize all tables to use the default opclasses instead. At that point, he should also clear the GUC on the converted databases. (Or rather, the automated tool that hasn't been written yet, should do that.) ORCA doesn't know about hash operator classes, or the possibility that we might need to use a different hash function for two columns with the same datatype. Therefore, it cannot produce correct plans for queries that mix different distribution hash opclasses for the same datatype, in the same query. There are checks in the Query->DXL translation, to detect that case, and fall back to planner. As long as you stick to the default opclasses in all tables, we let ORCA to create the plan without any regard to them, and use the default opclasses when translating the DXL plan to a Plan tree. We also allow the case that all tables in the query use the "legacy" opclasses, so that ORCA works after pg_upgrade. But a mix of the two, or using any non-default opclasses, forces ORCA to fall back. One curiosity with this is the "int2vector" and "aclitem" datatypes. They have a hash opclass, but no b-tree operators. GPDB 4 used to allow them as DISTRIBUTED BY columns, but we forbid that in GPDB 5, in commit 56e7c16b. Now they are allowed again, so you can specify an int2vector or aclitem column in DISTRIBUTED BY, but it's still pretty useless, because the planner still can't form EquivalenceClasses on it, and will treat it as "strewn" distribution, and won't co-locate joins. Abstime, reltime, tinterval datatypes don't have default hash opclasses. They are being removed completely on PostgreSQL v12, and users shouldn't be using them in the first place, so instead of adding hash opclasses for them now, we accept that they can't be used as distribution key columns anymore. Add a check to pg_upgrade, to refuse upgrade if they are used as distribution keys in the old cluster. Do the same for 'money' datatype as well, although that's not being removed in upstream. The legacy hashing code for anyarray in GPDB 5 was actually broken. It could produce a different hash value for two arrays that are considered equal, according to the = operator, if there were differences in e.g. whether the null bitmap was stored or not. Add a check to pg_upgrade, to reject the upgrade if array types were used as distribution keys. The upstream hash opclass for anyarray works, though, so it is OK to use arrays as distribution keys in new tables. We just don't support binary upgrading them from GPDB 5. (See github issue https://github.com/greenplum-db/gpdb/issues/5467). The legacy hashing of 'anyrange' had the same problem, but that was new in GPDB 6, so we don't need a pg_upgrade check for that. This also tightens the checks ALTER TABLE ALTER COLUMN and CREATE UNIQUE INDEX, so that you can no longer create a situation where a non-hashable column becomes the distribution key. (Fixes github issue https://github.com/greenplum-db/gpdb/issues/6317) Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/4fZVeOpXllQCo-authored-by: NMel Kiyama <mkiyama@pivotal.io> Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io> Co-authored-by: NPengzhou Tang <ptang@pivotal.io> Co-authored-by: NChris Hajas <chajas@pivotal.io> Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io> Reviewed-by: NNing Yu <nyu@pivotal.io> Reviewed-by: NSimon Gao <sgao@pivotal.io> Reviewed-by: NJesse Zhang <jzhang@pivotal.io> Reviewed-by: NZhenghua Lyu <zlv@pivotal.io> Reviewed-by: NMelanie Plageman <mplageman@pivotal.io> Reviewed-by: NYandong Yao <yyao@pivotal.io>
-
- 15 12月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
This removes a lot of GPDB-specific code that was used to deal with dynamic scans, and code duplication between nodes dealing with Heap, AO and AOCS tables. * Resurrect SeqScan node. We had replaced it with TableScan in GPDB. Teach SeqScan to also work on append-only and AOCS tables, and remove TableScan and all the code changes that were made in GPDB earlier to deal with all three table types. * Merge BitmapHeapScan, BitmapAppendOnlyScan, and BitmapTableScan node types. They're all BitmapHeapScans now. We used to use BitmapTableScans in ORCA-generated plans, and BitmapHeapScan/BitmapAppendOnlyScan in planner-generated plans, and there was no good reason for the difference. The "heap" part in the name is a bit misleading, but I prefer to keep the upstream name, even though it now handles AO tables as well. It's more like the old BitmapTableScan now, which also handled all three table types, but the code is refactored to stay as close to upstream as possible. * Introduce DynamicBitmapHeapScan. BitmapTableScan used to perform Dynamic scans too, now it's the responsibility of the new DynamicBitmapHeapScan plan node, just like we have DynamicTableScan and DynamicIndexScan as wrappers around SeqScand and IndexScans. * Get rid of BitmapAppendOnlyPath in the planner, too. Use BitmapHeapPath also for AO tables. * Refactor the way Dynamic Table Scan works. A Dynamic Table Scan node is now just a thin wrapper around SeqScan. It initializes a new SeqScan executor node for every partition, and lets it do the actual scanning. It now works the same way that I refactored Dynamic Index Scans to work in commit 198f701e. This allowed removing a lot of code that we used to use for both Dynamic Index Scans and Dynamic Table Scans, but is no longer used. There's now some duplication in the Dynamic* nodes, to walk through the partitions. They all have a function called setPidIndex(), for example, which does the same thing. But I think it's much more clear this way, than the previous DynamicController stuff. We could perhaps extract some of the code to common helper functions, but I think this is OK for now. This also fixes issue #6274. I'm not sure what exactly the bug was, but it was clearly in the Bitmap Table Scan code that is used with ORCA-generated plans. Now that we use the same code for plans generated with the Postgres planner and ORCA, it's not surprising that the bug is gone. Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io> Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
-
- 16 8月, 2018 2 次提交
-
-
由 Heikki Linnakangas 提交于
Commit d334b016 changed the name of this argument, but got the name wrong in this check. Because it was in a GPOS_BLOCK(), it would only compile, and throw the erorr, with an assertion-enabled ORCA build.
-
由 Bhuvnesh Chaudhary 提交于
As part of moving away from Hungarian notation in the GPORCA codebase, the integration points between GPORCA and GPDB in the translator have been renamed to the new convention used in GPORCA. The libraries currently updated to the new notation in GPORCA are Naucrates and GPOS. The new naming convention is a custom version of common C++ naming conventions. The style guide for this convention can be found in the GPORCA repository. Also bump ORCA version to 2.69.0 Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io> Co-authored-by: NMelanie Plageman <mplageman@pivotal.io> Co-authored-by: NEkta Khanna <ekhanna@pivotal.io> Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io> Co-authored-by: NSambitesh Dash <sdash@pivotal.io><Paste> Co-authored-by: NDhanashree Kashid <dkashid@pivotal.io> Co-authored-by: NOmer Arap <oarap@pivotal.io>
-
- 15 8月, 2018 2 次提交
-
-
由 Bhuvnesh Chaudhary 提交于
This reverts commit 2a38a9cd.
-
由 Bhuvnesh Chaudhary 提交于
As part of moving away from Hungarian notation in the GPORCA codebase, the integration points between GPORCA and GPDB in the translator have been renamed to the new convention used in GPORCA. The libraries currently updated to the new notation in GPORCA are Naucrates and GPOS. The new naming convention is a custom version of common C++ naming conventions. The style guide for this convention can be found in the GPORCA repository. Also bump ORCA version to 2.69.0 Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io> Co-authored-by: NMelanie Plageman <mplageman@pivotal.io> Co-authored-by: NEkta Khanna <ekhanna@pivotal.io> Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io> Co-authored-by: NSambitesh Dash <sdash@pivotal.io><Paste> Co-authored-by: NDhanashree Kashid <dkashid@pivotal.io> Co-authored-by: NOmer Arap <oarap@pivotal.io>
-
- 01 8月, 2018 1 次提交
-
-
由 Bhuvnesh Chaudhary 提交于
Translate nestparams passed from ORCA to create the nestparams node in Nested Loop joins. This feature can be enabled by setting the trace flag EopttraceEnableNestLoopParams. Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io> Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>
-
- 21 11月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
Much of the code and structs used by index scans and bitmap index scans had been fused together and refactored in GPDB, to share code between dynamic index scans and regular ones. However, it would be nice to keep upstream code unchanged as much as possible. To that end, refactor the exector code for dynamic index scans and dynamic bitmap index scans, to reduce the diff vs upstream. The Dynamic Index Scan executor node is now a thin wrapper around the regular Index Scan node, even thinner than before. When a new Dynamic Index Scan begins, we don't do much initialization at that point. When the scan begins, we initialize an Index Scan node for the first partition, and return rows from it until it's exhausted. On next call, the underlying Index Scan is destroyed, and a new Index Scan node is created, for the next partition, and so on. Creating and destroying the IndexScanState for every partition adds some overhead, but it's not significant compared to all the other overhead of opening and closing the relations, building scan keys etc. Similarly, a Dynamic Bitmap Index Scan executor node is just a thin wrapper for regular Bitmap Index Scan. When MultiExecDynamicBitmapIndexScan() is called, it initializes an BitmapIndexScanState for the current partition, and calls it. On ReScan, the BitmapIndexScan executor node for the old partiton is shut down. A Dynamic Bitmap Index Scan differs from Dynamic Index Scan in that a Dynamic Index Scan is responsible for iterating through all the active partitions, while a Dynamic Bitmap Index Scan works as a slave for the Dynamic Bitmap Heap Scan node above it. It'd be nice to do a similar refactoring for heap scans, but that's for another day.
-
- 25 9月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
It wasn't very useful. ORCA and Postgres both just stack WindowAgg nodes on top of each other, and no-one's been unhappy about that, so we might as well do that, too. This reduces the difference between GPDB and the upstream implementation, and will hopefully make it smoother to switch. Rename the Window Plan node type to WindowAgg, to match upstream, now that it is fairly close to the upstream version.
-
- 17 9月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
In GPDB, we have so far used a WindowFrame struct to represent the start and end window bound, in a ROWS/RANGE BETWEEN clause, while PostgreSQL uses the combination of a frameOptions bitmask and start and end expressions. Refactor to replace the WindowFrame with the upstream representation.
-
- 04 9月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
Planner and ORCA translator both implemented the same logic, to assign external table URIs to segments. But I spotted one case where the logic differed: CREATE EXTERNAL TABLE exttab_with_on_master( i int, j text ) LOCATION ('file://@hostname@@abs_srcdir@/data/exttab_few_errors.data') ON MASTER FORMAT 'TEXT' (DELIMITER '|'); SELECT * FROM exttab_with_on_master; ERROR: 'ON MASTER' is not supported by this protocol yet. With ORCA you got a less user-friendly error: set optimizer=on; set optimizer_enable_master_only_queries = on; postgres=# explain SELECT * FROM exttab_with_on_master; ERROR: External scan error: Could not assign a segment database for external file (CTranslatorDXLToPlStmt.cpp:472) The immediate cause of that was that commit fcf82234 didn't remember to modify the ORCA translator's copy of the same logic. But really, it's silly and error-prone to duplicate the code, so modify ORCA to use the same code that the planner does.
-
- 17 8月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
This allows removing all the code in CTranslatorDXLToPlStmt that tracked the parent of each call. I found the plan node IDs awkward, when I was hacking on CTranslatorDXLToPlStmt. I tried to make a change where a function would construct a child Plan node first, and a Result node on top of that, but only if necessary, depending on the kind of child plan. The parent plan node IDs made it impossible to construct a part of Plan tree like that, in a bottom-up fashion, because you always had to pass the parent's ID when constructing a child node. Now that is possible.
-
- 19 7月, 2017 1 次提交
-
-
由 Bhuvnesh Chaudhary 提交于
This commit introduces a new operator for ValuesScan, earlier we generated `UNION ALL` for cases where VALUES lists passed are all constants, but now a new Operator CLogicalConstTable with an array of const tuples will be generated Once the plan is generated by ORCA, it will be translated to valuesscan node in GPDB. This enhancement helps significantly in improving the total run time for the queries involving values scan in ORCA with const values. Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
- 04 4月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
It's an error in standard C - at least in older standards - to typedef the same type more than once, even if the definition is the same. Newer versions of gcc don't complain about it, but you can see the warnings with -pedantic (among a ton of other warnings, search for "redefinition"). To fix, remove the duplicate typedefs. The ones in src/backend/gpopt and src/include/gpopt were actually OK, because a duplicate typedef is OK in C++, and those files are compiled with a C++ compiler. But many of the typedefs in those files were not used for anything, so I nevertheless removed duplicate ones there too, that caught my eye. In gpmon.h, we were redefining apr_*_t types when postgres.h had been included. But as far as I can tell, that was always - all the files that included gpmon, included postgres.h directly or indirectly before that. Search & replace the references to apr_*_t types in that file with the postgres equivalents, to make it more clear what they actually are.
-
- 04 8月, 2016 1 次提交
-
-
由 foyzur 提交于
* Fixing DXL Translator bug where we lose canSetTag during Query object mutation and the translator ends up using wrong canSetTag. * Adding ICG test for verifying the the ORCA translator uses correct canSetTag.
-
- 13 7月, 2016 1 次提交
-
-
由 foyzur 提交于
* Preventing multiple ResLockPortal calls for the same portal when running multiple queries via PortalRunMulti by correctly populating canSetTag in PlannedStmt from Query object during DXL to PlannedStmt translation. * ICG tests for checking if ORCA correctly populates canSetTag.
-
- 19 5月, 2016 1 次提交
-
-
- 24 11月, 2015 1 次提交
-
-
由 Venkatesh Raghavan 提交于
-
- 28 10月, 2015 1 次提交
-
-