- 16 4月, 2020 2 次提交
-
-
由 Jinbao Chen 提交于
In the past, these system views only counted the values on each segment, and the master value was 0. We add some new views to gather the value from segment to master. Co-authored-by: NZhenghua Lyu <kainwen@gmail.com>
-
由 Lisa Owen 提交于
* docs - add info about querying the pg_stat_last_operation tbl * edits requested by david
-
- 15 4月, 2020 6 次提交
-
-
由 Daniel Gustafsson 提交于
Various typos spotted in internal in-tree documentation.
-
由 xiong-gang 提交于
As reported in issue #9790, 'CTAS with no data' statement doesn't handle WITH clause, the options in WITH clause should be added in 'pg_attribute_encoding'.
-
由 Hubert Zhang 提交于
SkipData flag should only short circuit in transientrel_receive on QE We should still do the begin/end work, e.g. remove the new created temp file, or we will have file leak. Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
-
由 Daniel Gustafsson 提交于
lbz2 is already in the LIBS, remove the redundant. Signed-off-by: NAdam Lee <ali@pivotal.io>
-
由 Daniel Gustafsson 提交于
Before this the top level configure only sets the have_yaml, but doesn't add -lyaml to LIBS for gpfdist or gpmapreduce. Let Autoconf resolve all the relevant libraries here and place them in $XXX_LIBS variables. Signed-off-by: NAdam Lee <ali@pivotal.io>
-
由 Shreedhar Hardikar 提交于
This bug is particularly evident with queries containing a large array IN clause, e.g "a IN (1, 3, 5, ...)". As a first step to improve optimization times for such queries, this commit reduces unnecessary re-allocation of histogram buckets during the merging of statistics of disjunctive predicates. It improves the performance of the target query with 7000 elements in the array comparison by around 50%. Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io> Co-authored-by: NAshuka Xue <axue@pivotal.io>
-
- 14 4月, 2020 1 次提交
-
-
由 Tingfang Bao 提交于
Remove CONFIGURE_FLAGS for win32 in compile_gpdb.bash The configure values are not used at all, so remove the unused codes here. Authored-by: NTingfang Bao <baotingfang@gmail.com>
-
- 13 4月, 2020 1 次提交
-
-
由 dh-cloud 提交于
The shared oldestXmin (DistributedLogShared->oldestXmin) may be updated concurrently. It should be set to a higher value, because a higher xmin can belong to another distributed log segment, its older segments might already be truncated. For Example: txA and txB call DistributedLog_AdvanceOldestXmin concurrently. ``` txA and txB: both hold shared DistributedLogTruncateLock. txA: set the DistributedLogShared->oldestXmin to XminA. TransactionIdToSegment(XminA) = 0009 txB: set the DistributedLogShared->oldestXmin to XminB. TransactionIdToSegment(XminB) = 0008 txA: truncate segment 0008, 0007... ``` After that, DistributedLogShared->oldestXmin == XminB, it is on removed segment 0008. Subsequent GetSnapshotData() calls will be failed because SimpleLruReadPage will error out.
-
- 10 4月, 2020 3 次提交
-
-
由 Zhenghua Lyu 提交于
Previously, in function bring_to_outer_query and bring_to_singleQE it depends on the path->param_info field to determine if the path can be taken into consideration since we cannot pass params across motion node. But this is not enough, for example, an index path's param_info field might be null, but its orderbyclauses refs some outer params. This commit fixes the issue by adding more check for indexpath. See Github Issue: https://github.com/greenplum-db/gpdb/issues/9733 for details.
-
由 Shreedhar Hardikar 提交于
GPDB 6 introduced a mechanism to distribute table tables on columns using a custom hash opclass, instead of using cdbhash. Before this commit, ORCA would ignore the distribution opclass, but ensuring the translator would only allow queries in which all tables were distributed by either their default or default "legacy" opclasses. However, in case of tables distributed by legacy or default opclasses, but joined using a non-default opclass operator, ORCA would produce an incorrect plan, giving wrong results. This commit fixes that bug by introducing support for distributed tables using non-default opfamilies/opclasses. But, even though the support is implemented, it is not fully enabled at this time. The logic to fallback to planner when the plan contains tables distributed with non-default non-legacy opclasses remains. Our intention is to support it fully in the future. How does this work? For hash joins, capture the opfamily of each hash joinable operator. Use that to create hash distribution spec requests for either side of the join. Scan operators derive a distribution spec based on opfamily (corresponding to the opclass) of each distribution column. If there is a mismatch between distribution spec requested/derived, add a Motion Redistribute node using the distribution function from the requested hash opfamily. The commit consists of several sub-sections: - Capture distr opfamilies in CMDRelation and related classes For each distribution column of the relation, track the opfamily of "opclass" used in the DISTRIBUTED BY clause. This information is then relayed to CTableDescriptor & CPhysicalScan. Also support this in other CMDRelation subclasses: CMDRelationCTAS (via CLogicalCTAS) & CMDRelationExternalGPDB. - Capture hash opfamily of CMDScalarOp using gpdb::GetCompatibleHashOpFamily() This is need to determined distribution spec requests from joins. - Track hash opfamilies of join predicates This commit extends the caching of join keys in Hash/Merge joins by also caching the corresponding hash opfamilies of the '=' operators used in those predicates. - Track opfamily in CDistributionSpecHashed. This commit also constructs CDistributionSpecHashed with opfamily information that was previously cached in CScalarGroup in the case of HashJoins. It also includes the compatibility checks that reject distributions specs with mismatched opfamilies in order to produce Redistribute motions. - Capture default distribution (hash) opfamily in CMDType - Handle legacy opfamilies in CMDScalarOp & CMDType - Handle opfamilies in HashExprList Expr->DXL translation ORCA-side notes: 1. To ensure correctness, equivalent classes can only be determined over a specific opfamily. For example, the expression `a = b` implies a & b belong to an equiv classes only for the opfamily `=` belongs to. Otherwise expression `b |=| c` can be used to imply a & c belong to the same equiv class, which is incorrect, as the opfamily of `=` and `|=|` differ. For this commit, determine equiv classes only for default opfamilies. This will ensure correct behavior for majority of cases. 2. This commit does *not* implement similar features for merge joins. That is left for future work. 3. This commit introduces two traceflags: - EopttraceConsiderOpfamiliesForDistribution: If this is off, opfamilies is ignored and set to NULL. This mimics behavior before this PR. Ctest MDPs are run this way. - EopttraceUseLegacyOpfamilies: Set if ANY distribution col in the query uses a legacy opfamily/opclass. MDCache getters will then return legacy opfamilies instead of the default opfamilies for all queries. What new information is captured from GPDB? 1. Opfamily of each distribution column in CMDRelation, CMDRelationCtasGPDB & CMDRelationExternalGPDB 2. Compatible hash opfamily of each CMDScalarOp using gpdb::GetCompatibleHashOpFamily() 3. Default distribution (hash) opfamily of every type. This maybe NULL for some types. Needed for certain operators (e.g HashAgg) that request distribution spec that cannot be inferred in any other way: cannot derive it, cannot get it from any scalar op etc. See GetDefaultDistributionOpfamilyForType() 4. Legacy opfamilies for types & scalar operators. Needed for supporting tables distributed by legacy opclasses. Other GPDB side changes: 1. HashExprList no longer carries the type of the expression (it is inferred from the expr instead). However, it now carries the hash opfamily to use when deriving the distribution hash function. To maintain compatibility with older versions, the opfamily is used only if EopttraceConsiderOpfamiliesForDistribution is set, otherwise, default hash distribution function of the type of the expr is used. 2. Don't worry about left & right types in get_compatible_hash_opfamily() 3. Consider COERCION_PATH_RELABELTYPE as binary coercible for ORCA. 4. EopttraceUseLegacyOpfamilies is set if any table is distributed by a legacy opclass.
-
由 Shreedhar Hardikar 提交于
This reverts commit 3e45f064.
-
- 09 4月, 2020 3 次提交
-
-
由 Zhenghua Lyu 提交于
LockRowsPath will clear the pathkeys info since when some other transactions concurrently update the same relation then it cannot guarantee the order. Postgres will not consider parallel path for the select statement with locking clause (it sets parallel_safe to false and parallel_workers to 0 in function create_lockrows_path). However, Greenplum contains many segments and is innately parallel. If we simply clear the pathkey then if later we need a gather, we will not choose merge gather so even if there is no concurrent transaction, the data is not in order. See Github issue: https://github.com/greenplum-db/gpdb/issues/9724. So just before the finaly gather, we save the pathkeys and then invoke create_lockrows_path. In the following gather, if we found saved_pathkeys is not NIL, we just create a merge gather. Another need to mention here is that, the condition that code can reach create_lockrows_path is very rigour: the query has to be a toplevel select statement and the range table has to be a normal heap table and there is only one table invole the query and many other conditions (please refer `checkCanOptSelectLockingClause` for details. As the above analysis, if the code reaches here and the path->pathkeys is not NIL, the following gather has to be the final gather. This is very important because if it is not the final gather, it might be used by others as subpath, and its pathkeys is not NIL which breaks the rules for lockrows path. We need to keep the pathkeys in the final gather here.
-
由 Hubert Zhang 提交于
In past, when create table and truncate table are in the same transaction, Greenplum will call heap_truncate_one_rel() to truncate the relation. But for AO tables, it has different segmenting logic, which leads to some of segment files cannot be truncated to zero. Using ao_truncate_one_rel to replace heap_truncate_one_rel() and handle the segmenting by ao_foreach_extent_file(). Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io> Reviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>
-
由 Ashuka Xue 提交于
Add a missing resource so that the ORCA dev pipeline can be run for both master and 6X.
-
- 08 4月, 2020 20 次提交
-
-
由 Pengzhou Tang 提交于
This flag is duplicated with 'forceEOS', 'forceEOS' can also tell whether errors occur or not.
-
由 Pengzhou Tang 提交于
We hit interconnect hung issue many times in many cases, all have the same pattern: the downstream interconnect motion senders keep sending the tuples and they are blind to the fact that upstream nodes have finished and quitted the execution earlier, the QD then get enough tuples and wait all QEs to quit which cause a deadlock. Many nodes may quit execution earlier, eg, LIMIT, HashJoin, Nest Loop, to resolve the hung issue, they need to stop the interconnect stream explicitly by calling ExecSquelchNode(), however, we cannot do that for rescan cases in which data might lose, eg, commit 2c011ce4. For rescan cases, we tried using QueryFinishPending to stop the senders in commit 02213a73 and let senders check this flag and quit, that commit has its own problem, firstly, QueryFini shPending can only set by QD, it doesn't work for INSERT or UPDATE cases, secondly, that commit only let the senders detect the flag and quit the loop in a rude way (without sending the EOS to its receiver), the receiver may still be stuck inreceiving tuples. This commit revert the QueryFinishPending method firstly. To resolve the hung issue, we move TeardownInterconnect to the ahead of cdbdisp_checkDispatchResult so it guarantees to stop the interconnect stream before waiting and checking the status of QEs. For UDPIFC, TeardownInterconnect() remove the ic entries, any packets for this interconnect context will be treated as 'past' packets and be acked with STOP flag. For TCP, TeardownInterconnect() close all connection with its children, the children will treat any readable data in the connection as a STOP message include the closure operation. A test case is not included, both commit 2c011ce4 and 02213a73 contain one.
-
由 Hubert Zhang 提交于
cdb_tidy_message is gpdb specific function. It will truncate the error message to just keep the first line and copy other lines to error details. For JDBC drivers, they follow the Postgres's error message and only keep error message in JDBC getWarnings function. It makes JDBC dirver cannot print the full error message of gpdb. It's better to follow postgres error format.
-
由 Chris Hajas 提交于
The gpepxand tests were going OOM with ORCA in debug build, as the segments require more memory. Previously, ORCA was not being run in debug build so this was not an issue. We've bumped up the instance from n1-standard-2 to n1-standard-4, which doubles our memory from 7.5GB to 15GB.
-
由 Ashuka Xue 提交于
Previously, this test depended on a very specific amount of memory being "eaten" or consumed before it went OOM. ORCA using gpdb allocators vs the legacy allocation (which is used in debug builds for leak detection) changes the output. However, we have no way to detect or change the output file depending on whether we're in debug build or not. Now, instead of consuming the same amount of memory each time in the test, we consume a small amount and ensure we don't go OOM. Then we consume a larger amount and ensure we do go OOM. This hardens the test while keeping the intent.
-
由 Chris Hajas 提交于
During the behave tests, we change some of the system limits such as memory and overcommit configurations. Previously this was changed while the cluster was running, which caused the container to crash when setting some of these parameters with ORCA in debug build. Now, we stop the cluster, change the limits, and start the cluster.
-
由 Hans Zeller 提交于
-
由 Hans Zeller 提交于
-
由 Chris Hajas 提交于
This was causing an ORCA assertion to fail and fall back, as we asserted that if a type is not binary coercible, then it had a cast function. Although this code considered domain->base type as binary coercible, it did not consider base type->domain as binary coercible. However, coercion paths of COERCION_PATH_RELABELTYPE are binary coercible, so we should mark them as such for ORCA.
-
由 Chris Hajas 提交于
These were exposed when running ICW with ORCA asserts enabled. In DeriveJoinStats, EopLogicalFullOuterJoin is also a valid logical join operator. In IDatum, we need to check that doubles are within some epsilon as we're not passing in the full 64 bit IEEE value to ORCA. With fixing the assertion, we would need to regenerate the mdp for MinCardinalityNaryJoin However, there is no DDL/query for this test so it is difficult to update. Since it also didn't seem to provide much value, we're removing it.
-
由 Chris Hajas 提交于
With the ORCA source in the GPDB repo, there is no use for Conan or the depends directory anymore. Also, the gpbackup conan file was stale and wasn't being used/supported.
-
由 Chris Hajas 提交于
-
由 Chris Hajas 提交于
The main change here is that we completely get rid of the ORCA prod pipeline, as there's no need to create a tag/publish artifacts anymore. We also now build and test the orca unit tests during the compile step. Additionally, I've refactored the orca-dev "slack command" pipeline to use the canonical compile_gpdb.bash script and based the pipeline off the existing PR pipeline. This simplifies the `test_orca_pipeline.yml` file significantly and it now doesn't rely on custom scripts. There are still quite a few files in the concourse directory that are used by the `test_explain_pipeline.yml` and a couple of pipelines that are external to this repo. Some of these aren't actively being used and need to be cleaned up/removed, but I'll leave that for a later time. Also, uses https for xerces-c download and checks hash to ensure correct download.
-
由 Chris Hajas 提交于
Previously, we used a config file that was modified by cmake. Instead, use configure to modify macros. Additionally use a compile definition in cmake so we don't need a separate config file in gpos. This also enables compile-time warnings in the src/backend/gporca files. C++ files in the translator never had these warnings enabled, and so are not enabled in src/backend/gpopt. A block of unused code is removed to get pass the compile warnings.
-
由 Chris Hajas 提交于
This simplifies some of the cmake files further.
-
由 Chris Hajas 提交于
Cmake should only build targets for use with gporca_test. GNU make should build orca within GPDB. Remove the install target to decrease the chance for confusion. Also, add makefile to test directory so make clean works.
-
由 Shreedhar Hardikar 提交于
ORCA doesn't need a specific version anymore. A query plan will now look like: ``` QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=1 width=1) Optimizer: Pivotal Optimizer (GPORCA) (2 rows) ``` Update regression tests affected by this change. Additionally, use g++ for compiling mock unit tests since we're linking c++ files.
-
由 Shreedhar Hardikar 提交于
ORCA previously was built into a shared library using cmake. Now, we build ORCA into the postgres binary.
-
由 Shreedhar Hardikar 提交于
We no longer need to check for ORCA libraries or version, as ORCA now exists in the same repo and the concept of a version is not needed.
-
由 Chris Hajas 提交于
ORCA currently exists as a standalone repo. However, it requires significant coordination to build GPDB with ORCA or make changes to ORCA. With these commits, we get rid of the versioning concept, which hopefully will make both building and developing on ORCA much easier (and you won't hit version mismatch issues anymore) The GPDB build process will now also build ORCA using make. If you want to run the ORCA unit tests, those will continue to use cmake/ctest. Cmake will not install ORCA or build ORCA with GPDB, it is only used to run ORCA unit tests. This merge commit includes all ORCA commits from when ORCA was open-sourced until f6044123 (v3.97.1) This commit was generated using `git subtree add -P src/backend/gporca git@github.com:greenplum-db/gporca.git HEAD`. Add 'src/backend/gporca/' from commit 'f6044123' git-subtree-dir: src/backend/gporca git-subtree-mainline: b6fbabdb git-subtree-split: f6044123 Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/q4g73r6Y_qA/OpZXDFFwAwAJ
-
- 07 4月, 2020 1 次提交
-
-
由 Richard Guo 提交于
The DistributionKey for path of FULL JOIN may contain multiple ECs. We need to make sure each of them contains expr belonging to the given list, to tell grouping on this given set of exprs can be done in place without motion. Fixes #9784. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
- 04 4月, 2020 1 次提交
-
-
由 Lisa Owen 提交于
* docs WIP - add some python3 info * move data type mapping to pl/python page; add more py3 info * add set-returning function example * address comments from david
-
- 03 4月, 2020 2 次提交
-
-
由 Ryan Zhang 提交于
Co-authored-by: NRyan <ryan@chapterx.com>
-
由 Heikki Linnakangas 提交于
Without this patch: select distinct x from xidtab; ERROR: could not identify an ordering operator for type xid LINE 1: select distinct x from xidtab; ^ HINT: Use an explicit ordering operator or modify the query. There were two similar but distinct issues: 1. transformDistinctToGroupBy() called addTargetToSortList(), even though the expression might not have ordering operators. There's another function, addTargetToGroupList(), for that. Use that. 2. make_distribution_exprs_for_groupclause() tried to create a PathKey on the expression, even if it didn't have ordering operators. Also update a comment to explain why we do the DISTINCT -> GROUP BY transformation. The comment said that it was because DISTINCT doesn't support hashing, but that was fixed in PostgreSQL 8.4 already. However, the planner still doesn't know how to perform the "pre-unique" optimization for DISTINCT, so you still get better plans with GROUP BY. This came up when working on the v12 merge. A new test was added to upstream, 'hash_func', which contained a query like this involving 'relacl' datatype. But it's a pre-existing issue. Fixes https://github.com/greenplum-db/gpdb/issues/9855Reviewed-by: NPengzhou Tang <ptang@pivotal.io>
-