- 09 11月, 2020 7 次提交
-
-
由 Gang Xiong 提交于
-
由 xiaoxiao 提交于
* refactor gpload test file TEST.py 1. migrate gpload test to pytest 2. new function to form config file through yaml package and make it more reasonable 3. add a case to cover gpload update_condition arggument * migrate gpload and TEST.py to python3.6 new test case 43 to test gpload behavior when column name has capital letters and without data type change some ans file since psql react different * change sql to find reuseable external table to make gpload compatible in gp7 and gp6 better TEST.py to write config file with ruamel.yaml moudle Co-authored-by: NXiaoxiaoHe <hxiaoxiao@vmware.com>
-
由 Ning Yu 提交于
The user can adjust the ic-proxy peer addresses at runtime and reload by sending SIGHUP, if an address is modified or removed, the corresponding peer connection must be closed or reestablished. The same to the peer listener, if the listener port is changed, then must re-setup the listener.
-
由 Ning Yu 提交于
The peer addresses are specified with the GUC gp_interconnect_proxy_addresses, it can be reloaded on SIGHUP, we used to only care about the newly added ones, however it is also possible for the user to modify them, or even remove some of them. So now we add the logic to classify the addresses after parsing the GUC, we can tell whether an address is added, removed, or modified. The handling of the classified addresses will be done in the next commit.
-
由 Ning Yu 提交于
We used to scan the whole addr list to find my addr, now we record it directly when parsing the addresses.
-
由 Ning Yu 提交于
A ICProxyAddr variable is usually named as "addr", so the attribute is referred as "addr->addr", it's confusing and sometimes ambiguous. So renamed the attribute to "sockaddr", the function ic_proxy_extract_addr() is also renamed to ic_proxy_extract_sockaddr().
-
由 Peifeng Qiu 提交于
ProcessCopyOptions checks the option list of COPY command. It's also called by external table when text/csv format is used. It's better not to mix external table specific options here and check them separately. Checking custom protocol here is not necessary because it's checked when parsing location urls in GenerateExtTableEntryOptions anyway.
-
- 07 11月, 2020 3 次提交
-
-
由 Kalen Krempely 提交于
This commit does the following: 1. Extract config_primaries_for_replication to be used by both gpaddmirrors and gprecoverseg. 2. gprecoverseg: add replication entries for primaries 3. gprecoverseg: add support for --hba-hostnames
-
由 Kalen Krempely 提交于
-
由 Bhuvnesh Chaudhary 提交于
This commit does the following: 1. Removes the calls related to gphost cache. 2. Uses ping to validate if the hostname could be resolved.
-
- 05 11月, 2020 4 次提交
-
-
由 xiong-gang 提交于
When stash a 'VisimapDelete' to the buffile, we must seek to end of the last physical file if the buffile contains multiple files. This commit cherry-pick part of the commit from upstream: commit 808e13b282efa7e7ac7b78e886aca5684f4bccd3 Author: Amit Kapila <akapila@postgresql.org> Date: Wed Aug 26 07:36:43 2020 +0530 Extend the BufFile interface. Allow BufFile to support temporary files that can be used by the single backend when the corresponding files need to be survived across the transaction and need to be opened and closed multiple times. Such files need to be created as a member of a SharedFileSet. Additionally, this commit implements the interface for BufFileTruncate to allow files to be truncated up to a particular offset and extends the BufFileSeek API to support the SEEK_END case. This also adds an option to provide a mode while opening the shared BufFiles instead of always opening in read-only mode. These enhancements in BufFile interface are required for the upcoming patch to allow the replication apply worker, to handle streamed in-progress transactions. Author: Dilip Kumar, Amit Kapila Reviewed-by: Amit Kapila Tested-by: Neha Sharma Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
-
由 Hans Zeller 提交于
This is a cherry-pick of the change from PR https://github.com/greenplum-db/gporca/pull/607 Avoid costing change for IN predicates on btree indexes Commit e5f1716 changed the way we handle IN predicates on indexes, it now uses a more efficient array comparison instead of treating it like an OR predicate. A side effect is that the cost function, CCostModelGPDB::CostBitmapTableScan, now goes through a different code path, using the "small NDV" or "large NDV" costing method. This produces very high cost estimates when the NDV increases beyond 2, so we basically never choose an index for these cases, although a btree index used in a bitmap scan isn't very sensitive to the NDV. To avoid this, we go back to the old formula we used before commit e5f1716. The fix is restricted to IN predicates on btree indexes, used in a bitmap scan. Add an MDP for a larger IN list, using a btree index on an AO table Misc. changes to the calibration test program - Added tests for btree indexes (btree_scan_tests). - Changed data distribution so that all column values range from 1...n. - Parameter values for test queries are now proportional to selectivity, a parameter value of 0 produces a selectivity of 0%. - Changed the logic to fake statistics somewhat, hopefully this will lead to more precise estimates. Incorporated the changes to the data distribution with no more 0 values. Added fake stats for unique columns. - Headers of tests now use semicolons to separate parts, to give a nicer output when pasting into Google Docs. - Some formatting changes. - Log fallbacks. - When using existing tables, the program now determines the table structure (heap or append-only) and the row count. - Split off two very slow tests into separate test units. These are not included when running "all" tests, they have to be run explicitly. - Add btree join tests, rename "bitmap_join_tests" to "index_join_tests" and run both bitmap and btree joins - Update min and max parameter values to cover a range that includes or at least is closer to the cross-over between index and table scan - Remove the "high NDV" tests, since the ranges in the general test now include both low and high NDV cases (<= and > 200) - Print out selectivity of each query, if available - Suppress standard deviation output when we execute queries only once - Set search path when connecting - Decrease the parameter range when running bitmap scan tests on heap tables - Run btree scan tests only on AO tables, they are not designed for testing index scans Updates to the experimental cost model, new calibration 1. Simplify some of the formulas, the calibration process seemed to justify that. We might have to revisit if problems come up. Changes: - Rewrite some of the formulas so the costs per row and costs per byte are more easy to see - Make the cost for the width directly proportional - Unify the formula for scans and joins, use the same per-byte costs and make NDV-dependent costs proportional to num_rebinds * dNDV, except for the logic in item 3. That makes the cost for the new experimental cost model a simple linear formula: num_rebinds * ( rows * c1 + rows * width * c2 + ndv * c3 + bitmap_union_cost + c4 ) + c5 We have 5 constants, c1 ... c5: c1: cost per row (rows on one segment) c2: cost per byte c3: cost per distinct value (total NDV on all segments) c4: cost per rebind c5: initialization cost bitmap_union_cost: see item 3 below 2. Recalibrate some of the cost parameters, using the updated calibration program src/backend/gporca/scripts/cal_bitmap_test.py 3. Add a cost penalty for bitmap index scans on heap tables. The added cost takes the form bitmap_union_cost = <base table rows> * (NDV-1) * c6. The reason for this is, as others have pointed out, that heap tables lead to much larger bit vectors, since their CTIDs are more spaced out than those of AO tables. The main factor seems to be the cost of unioning these bit vectors, and that cost is proportional to the number of bitmaps minus one and the size of the bitmaps, which is approximated here by the number of rows in the table. Note that because we use (NDV-1) in the formula, this penalty does not apply to usual index joins, which have an NDV of 1 per rebind. This is consistent with what we see in measurements and it also seems reasonable, since we don't have to union bitmaps in this case. 4. Fix to select CostModelGPDB for the 'experimental' model, as we do in 5X. 5. Calibrate the constants involved (c1 ... c6), using the calibration program and running experiments with heap and append-only tables on a laptop and also on a Linux cluster with 24 segments. Also run some other workloads for validation. 6. Give a small initial advantage to bitmap scans, so they will be chosen over table scans for small tables. Otherwise, small queries will have more or less random plans, all of which cost around 431, the value of the initial cost. Added a 10% advantage of the bitmap scan. * Port calibration program to Python 3 - Used 2to3 program to do the basics. - Version parameter in argparse no longer supported - Needs additional option in connection string to keep the search path - The dbconn.execSQL call can no longer be used to get a cursor, this was probably a non-observable defect in the Python 2 version - Needed to use // (floor division) in some cases Co-authored-by: NDavid Kimura <dkimura@vmware.com>
-
由 Hans Zeller 提交于
* Improve partition elimination when indexes are present (port from 6X) * Use original join pred for DPE with index nested loop joins Dynamic partition selection is based on a join predicate. For index nested loop joins, however, we push the join predicate to the inner side and replace the join predicate with "true". This meant that we couldn't do DPE for nested index loop joins. This commit remembers the original join predicate in the index nested loop join, to be used in the generated filter map for DPE. The original join predicate needs to be passed through multiple layers. * SPE for index preds Some of the xforms use method CXformUtils::PexprRedundantSelectForDynamicIndex to duplicate predicates that could be used both as index predicates and as partition elimination predicates. The call was missing in some other xforms. Added it. * Changes to equivalent distribution specs with redundant predicates Adding redundant predicates causes some issues with generating equivalent distribution specs, to be used for the outer table of a nested index loop join. We want the equivalent spec to be expressed in terms of outer references, which are the columns of the outer table. By passing in the outer refs, we can ensure that we won't replace an outer ref in a distribution spec with a local variable from the original distribution spec. Also removed the asserts in CPhysicalFilter::PdsDerive that ensure the distribution spec is complete (consisting of only columns from the outer table) after we see a select node. Even without my changes, the asserts do not always hold, as this test case shows: drop table if exists foo, bar; create table foo(a int, b int, c int, d int, e int) distributed by(a,b,c); create table bar(a int, b int, c int, d int, e int) distributed by(a,b,c); create index bar_ixb on bar(b); set optimizer_enable_hashjoin to off; set client_min_messages to log; -- runs into assert explain select * from foo join bar on foo.a=bar.a and foo.b=bar.b where bar.c > 10 and bar.d = 11; Instead of the asserts, we now use the new method of passing in the outer refs to ensure that we move towards completion. We also know now that we can't always achieve a complete distribution spec, even without redundant predicates. * MDP changes Various changes to MDPs: - New SPE filters used in plan - New redundant predicates (partitioning or on non-partitioning columns) - Plan space changes - Cost changes - Motion changes - Regenerated, because plan switched to a hash join, so used a guc to force an index plan - Fixed lookup failures - Add mdp where we try unsuccessfully to complete a distribution spec * ICG result changes - Test used the 'experimental' cost model to force an index scan, but we now get the index scan even with the default cost model (plans currently fall back). This is a cherry-pick of commit 4a7a6821 from the 6X_STABLE branch.
-
由 Bhuvnesh Chaudhary 提交于
Earlier the error was sent to /dev/null and the information was lost displaying the cause of the error. Redirect the error to log file.
-
- 04 11月, 2020 9 次提交
-
-
由 Jesse Zhang 提交于
-
由 Jesse Zhang 提交于
They have the same no-op implementation as the overridden base method, and they aren't called anywhere.
-
由 Jesse Zhang 提交于
The method called doesn't actually do any printing, and we don't assert on the output. Removing the call doesn't even change the program output.
-
由 Jesse Zhang 提交于
-
由 Jesse Zhang 提交于
While working on extracting a common implementation of DbgPrint() into a mixin (commit forthcoming), I ran into the curious phenomenon that is the non-const CMemo::OsPrint. I almost dropped the requirement that DbgPrint requires "OsPrint() const", before realizing that the root cause is CSyncList has non-const Next() and friends. And that could be easily fixed. Make it so. While we're at it, also fixed a fairly obvious omission in CMemo::OsPrint where the output stream parameter was unused. We output to an unrelated "auto" stream instead. This was probably never noticed because we were relying on the assumption that streams are always connected to standard output.
-
由 Jesse Zhang 提交于
These are classes that are only implementing an OsPrint method just so that they can have a debug printing facility. They are not overriding anything from a base class, so the "virtual" was just a bad habit. Remove them.
-
由 Jesse Zhang 提交于
I looked through the history, this class was dead on arrival and *never* used. Ironically, we kept adding #include for its header over the years to places that didn't use the class.
-
由 Jesse Zhang 提交于
Mortality date, in chronological order: gpos/memory/ICache.h: Added in 2010, orphaned in 2011 (private commit) CL 90194 gpopt/utils/COptClient.h and gpopt/utils/COptServer.h: Added in 2012, orphaned in 2015 (private commit) MPP-25631 gpopt/base/CDrvdPropCtxtScalar.h: Dead on arrival when added in 2013 gpos/error/CAutoLogger.h: Added in 2012, orphaned in 2014 (private commit) CL 189022 unittest/gpos/task/CWorkerPoolManagerTest.h: Added in 2010, orphaned in 2019 in commit 61c7405a "Remove multi-threading code" (greenplum-db/gporca#510) unittest/gpos/task/CAutoTaskProxyTest.h wasn't removed in commit 61c7405a probably because there was an reference in CWorkerPoolManagerTest.h which is was also left behind (chained orphaning).
-
由 Gang Xiong 提交于
commit '5f7cdc' didn't update the answer file expand_table.out
-
- 03 11月, 2020 1 次提交
-
-
由 xiong-gang 提交于
Co-authored-by: NGang Xiong <gangx@vmware.com>
-
- 02 11月, 2020 1 次提交
-
-
由 Ning Wu 提交于
The repo https://github.com/greenplum-db/greenplum-database-release has been changed default repo from master to main. This is to sync up this change Co-authored-by: NNing Wu <ningw@vmware.com> Co-authored-by: NShaoqi Bai <bshaoqi@vmware.com>
-
- 31 10月, 2020 2 次提交
-
-
由 Abhijit Subramanya 提交于
Commit 4bbbb381 introduced some hardening around concurrent drop and recreate of tables while analyzedb is running but it failed to take into account the code around updating the last operation performed. This commit fixes it.
-
由 David Yozie 提交于
-
- 30 10月, 2020 6 次提交
-
-
由 Chen Mulong 提交于
The error was introduced by dc96f667. If `set -u` was called before sourcing greenplum_path.sh with bash, an error `ZSH_VERSION: unbound variable` would be reported. To solve the issue, use shell syntax `{:-}` which will output an empty value if the variable doesn't exist. Tested with zsh, bash and dash.
-
由 (Jerome)Junfeng Yang 提交于
On QD, it tracks whether QE wrote_xlog in the libpq connection. The logic is, if QE writes xlog, it'll send a libpq msg to QD. But the msg is sent in ReadyForQuery. So, before QE execute this function, the QE may already send back results to QD. Then when QD process this message, it does not read the new wrote_xlog value. This makes the connection still contains the previous dispatch wrote_xlog value, which will affect whether choosing one phase commit. The issue only happens when the QE flush the libpq msg before the ReadyForQuery function, hard to find a case to cover it. I found the issue when I playing the code to send some information from QE to QD. And it breaks the gangsize test which shows the commit info.
-
由 Chen Mulong 提交于
The generated greenplum_path.sh env file contained bash specific syntax previously, so it errors out if the user's shell is zsh. zsh doesn't have BASH_SOURCE. "${(%):-%x}" is the similar replacement for zsh. Also try to support other shells with some command combinations. Tested with bash/zsh/dash.
-
由 David Yozie 提交于
* Docs - update interconnect proxy discussion to cover hostname support * Change gp_interconnect_type -> gp_interconnect_proxy_addresses in note
-
由 Lisa Owen 提交于
-
由 dh-cloud 提交于
Looking at GP documents, there is no indication that master dbid must be 1. However, when CREATE_QD_DB, gpinitsystem always writes "gp_dbid=1" into file `internal.auto.conf` even if we specify: ``` mdw~5432~/data/master/gpseg-1~2~-1 OR mdw~5432~/data/master/gpseg-1~0~-1 ``` But catalog gp_segment_configuration can have the correct master dbid value (2 or 0), the mismatch causes gpinitsystem hang. Users can run into such problem for their first time to use gpinitsystem -I. Here we test dbid 0, because PostmasterMain() will simply check dbid >= 0 (non-utility mode), it says: > This value must be >= 0, or >= -1 in utility mode It seems 0 is a valid value. Changes: - use specified master dbid field when CREATE_QD_DB. - remove unused macros MASTER_DBID, InvalidDbid in C sources. Reviewed-by: NAshwin Agrawal <aashwin@vmware.com>
-
- 29 10月, 2020 2 次提交
-
-
由 Lisa Owen 提交于
-
由 dh-cloud 提交于
If cdbcomponent_getCdbComponents() caught an error threw by function getCdbComponents, FtsNotifyProber would be called. But if it happened inside fts process, ftp process would hang. Skip fts probe for fts process, after that, under the same situation, fts process would exit and then be restarted by postmaster.
-
- 28 10月, 2020 5 次提交
-
-
由 (Jerome)Junfeng Yang 提交于
Collect tuple relead pgstat table info from segments. Then the auto-analyze could consider partition tables now. Since before, we don't have accurate pgstat for partition leaf table. This kind of info is counted through the access method on segments and we used to collect them by the estate es_processed count on QD. So if insert into the root partition table, we can not know how many tuples inserted into a leaf, autovacuum never trigger auto-ANALYZE for leaf table. The idea is, for writer QE, report current nest level xact tables pgstat to QD through libpq at the end of a query statement. For a single statement, it wouldn't operate too many tables, so the effort is really small. And on QD, retrieve and combine these tables' stat from the dispatch result and add to current nest level xact pgstats. Now we can remove the old pgstat collection code on QD. The pgstat for a table could be view by query `pg_stat_all_tables_inernal`. And now, except for the scan related counters, other counters should be accurate. On master, the table's pgstat of scan related counters are not gathered from segments yet, this requires extra work. The current implementation is already enough for supporting auto-ANALYZE on partition table. Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
-
由 盏一 提交于
In some cases, some signals (like SIGQUIT) that should only be processed by the main thread of the postmaster may be dispatched to rxThread. So we should and it is safe to block all signals in the udp pthreads. Fix #11006
-
由 Lisa Owen 提交于
-
由 Lisa Owen 提交于
-
由 Jesse Zhang 提交于
We started hitting this on Thursday, and there's been ongoing report from the community about this as well. While upstream is figuring out a long term solution [1], we've been advised [2] to pin to the previous release (v0.21.0) to avoid being blocked for hours at once. [1]: https://github.com/telia-oss/github-pr-resource/pull/238 [2]: https://github.com/telia-oss/github-pr-resource/pull/238#issuecomment-714830491
-