- 24 6月, 2020 1 次提交
-
-
由 Jinbao Chen 提交于
If the directory of tablespace does not exist, we should got a error on commit transaction. But error on commit transaction will cause a panic. So the directory of tablespace should be checked so that we can avoid panic.
-
- 22 6月, 2020 1 次提交
-
-
由 (Jerome)Junfeng Yang 提交于
This fix the error: ``` --- /tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/appendonly.out 2020-06-16 08:30:46.484398384 +0000 +++ /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/appendonly.out 2020-06-16 08:30:46.556404454 +0000 @@ -709,8 +709,8 @@ SELECT oid FROM pg_class WHERE relname='tenk_ao2')); case | objmod | last_sequence | gp_segment_id -----------+--------+---------------+--------------- + NormalXid | 0 | 1-2900 | 1 NormalXid | 0 | >= 3300 | 0 - NormalXid | 0 | >= 3300 | 1 NormalXid | 0 | >= 3300 | 2 NormalXid | 1 | zero | 0 NormalXid | 1 | zero | 1 ``` The flaky is because of the orca `CREATE TABLE` statement without `DISTRIBUTED BY` will treat the table as randomly distributed. But the planner will treat as distributed by the table's first column. ORCA: ``` CREATE TABLE tenk_ao2 with(appendonly=true, compresslevel=0, blocksize=262144) AS SELECT * FROM tenk_heap; NOTICE: Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy entry. ``` Planner: ``` CREATE TABLE tenk_ao2 with(appendonly=true, compresslevel=0, blocksize=262144) AS SELECT * FROM tenk_heap; NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'unique1' as the Greenplum Database data distribution key for this table. ``` So the data distribution for table tenk_ao2 is not as expected. Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
-
- 08 6月, 2020 1 次提交
-
-
由 Paul Guo 提交于
Use guc gp_role only now and replace the functionality of guc gp_session_role with it also. Previously we have both gucs. The difference of the two gucs are (copied from code comment): * gp_session_role * * - does not affect the operation of the backend, and * - does not change during the lifetime of PostgreSQL session. * * gp_role * * - determines the operating role of the backend, and * - may be changed by a superuser via the SET command. This is not friendly for coding. For example, You could find Gp_role and Gp_session_role are set as GP_ROLE_DISPATCH on Postmaster & many aux processes on all nodes (even QE nodes) in a cluster, so you can see that to differ from QD postmaster and QE postmaster, current gpdb uses an additional -E option in postmaster arguments. These makes developers confusing when writing role branch related code given we have three related variables. Also some related code is even buggy now (e.g. 'set gp_role' even FATAL quits). With this patch we just have gp_role now. Some changes which might be interesting in the patch are: 1. For postmaster, we should specify '-c gp_role=' (e.g. via pg_ctl argument) to determine the role else we assume the utility role. 2. For stand-alone backend, utility role is enforced (no need to specify by users). 3. Could still connect QE/QD nodes using utility mode with PGOPTIONS, etc as before. 4. Remove the '-E' gpdb hacking and align the '-E' usage with upstream. 5. Move pm_launch_walreceiver out of the fts related shmem given the later is not used on QE. Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io> Reviewed-by: NGang Xiong <gxiong@pivotal.io> Reviewed-by: NHao Wu <gfphoenix78@gmail.com> Reviewed-by: NYandong Yao <yyao@pivotal.io>
-
- 03 6月, 2020 1 次提交
-
-
由 Andrey Borodin 提交于
Cluster on AO tables is implemented by sorting the entire AO table using tuple sort framework, according to a btree index defined on the table. A faster way to cluster is to scan the tuples in index-order, but this requires index-scan support. Append-optimized tables do not support index-scans currently, but when this support is added, the cluster operation can be enhanced accordingly. Author: Andrey Borodin <amborodin@acm.org> Reviewed and slightly edited by: Asim R P <pasim@vmare.com> Merges GitHub PR #9996
-
- 31 5月, 2020 2 次提交
-
-
由 Heikki Linnakangas 提交于
Because why not? This was forbidden back in 2012, with an old JIRA ticket that said: MPP-15735 - Inheritance is not supported with column oriented table This shouldn't be possible: Create table parent_tb2 (a1 int, a2 char(5), a3 text, a4 timestamp, a5 date, column a1 ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768)) with (appendonly=true,orientation=column) distributed by (a1); Create table child_tb4 (c1 int, c2 char) inherits(parent_tb2) with (appendonly = true, orientation = column); The reason is, dealing with column compression and inheritance is tricky and inheritance shouldn't even be supported. That explanation didn't go into any details, but I can see the trickiness. There are many places you can specify column compression options even without inheritance: in gp_default_storage_options GUC, in an ENCODING directive in CREATE TABLE command, in a "COLUMN foo ENCODING ..." directive in the CREATE TABLE command, or in the datatype's default storage options. Inheritance adds another dimension to it: should the current gp_default_storage_options override the options inherited from the parent? What if there are multiple parents with conflicting options? The easiest solution is to never inherit the column encoding options from the parent. That's a bit lame, but there's some precedence for that with partitions: if you ALTER TABLE ADD PARTITION, the encoding options are also not copied from the parent. So that's what this patch does. One interesting corner case is to specify the options for an inherited column with "CREATE TABLE (COLUMN parentcol ENCODING ...) INHERITS (...)". Thanks to the previous refactoring, that works, even though 'parentcol' is not explicitly listed in the CREATE TABLE command but inherited from the parent. To make dump & restore of that work correctly, modify pg_dump so that it always uses the "COLUMN foo ENCODING ..." syntax to specify the options, instead of just tacking "ENCODING ..." after the column definition. One excuse for doing this right now is that even though we had forbidden "CREATE TABLE <tab> INHERITS (<parent>)" on AOCO tables, we missed "ALTER TABLE <tab> INHERIT <parent>". That was still allowed, and if you did that you got an inherited AOCO table that worked just fine, except that when it was dumped with pg_dump, the dump was unrestorable. It would have been trivial to add a check to forbid "ALTER TABLE INHERIT" on an AOCO table, instead, but it's even better to allow it. Fixes https://github.com/greenplum-db/gpdb/issues/10111Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
-
由 Heikki Linnakangas 提交于
Instead of parsing them in the parse analysis phase, delay it until DefineRelation(), after merging the definitions of inherited attributes. We don't support INHERIT clause on an AOCO table today, but if we lift that limitation (as I'm planning to do in a follow-up commit), we need to be able to apply "COLUMN <col> ENCODING ..." clauses to inherited columns too. Before this, all callers of create_ctas_internal() had to also call AddDefaultRelationAttributeOptions(), in case it was an AOCO table. Now that DefineRelation() handles the default storage options, that's no longer needed. Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
-
- 22 5月, 2020 1 次提交
-
-
由 Huiliang.liu 提交于
pkill is not in /bin/ folder on Ubuntu, so gpfdist can't be killed in sreh test. That would make gpfdist regression test fail
-
- 19 5月, 2020 1 次提交
-
-
由 Adam Lee 提交于
Keep the syntax of external tables, but store as foreign tables with options into the catalog. While using it, transform the foreign table options to an ExtTableEntry, which is compatible with external_table_shim, PXF and other custom protocols. Also, add `DISTRIBUTED` syntax support for `CREATE FOREIGN TABLE` if the foreign server indicates it's an external table. Note: 1, all permissions handling from pg_authid are still, to keep the external tables' GRANT queries, will do that in another PR if possible. 2, multiple URI locations are stored in foreign table options, separated by `|`. Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
-
- 28 4月, 2020 1 次提交
-
-
由 Paul Guo 提交于
It was removed during the slice refactor work. I found that when running test isolation2/terminate_in_gang_creation. This feature should be quite useful in parallel OLTP load. With this code change, test isolation2/terminate_in_gang_creation needs to be modified to be deterministic. Also added tests in regression/dispatch for this code change. Changes in other tests are not relevant (just for cleanup). Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
-
- 26 4月, 2020 1 次提交
-
-
由 xiong-gang 提交于
ALTER DATABASE SET ... FROM CURRENT dispatches incorrect statement to the segments. Reported in https://github.com/greenplum-db/gpdb/issues/9823
-
- 20 3月, 2020 1 次提交
-
-
由 (Jerome)Junfeng Yang 提交于
For ETL user scenarios, there are some cases that may frequently create and drop the same external table. And once the external table gets dropped. All errors stored in the error log will lose. To enable error log persistent for external with the same "dbname"."namespace"."table". Bring in "error_log_persistent" external table option. If create the external table with `OPTIONS (error_log_persistent 'true')` and `LOG ERROR`, the external's error log will be name as "dbid_namespaceid_tablename" under "errlogpersistent" directory. And drop external table will ignore to delete the error log. Since GPDB 5, 6 still use pg_exttable's options to mark LOG ERRORS PERSISTENTLY, so keep the ability for loading from OPTIONS(error_log_persistent 'true'). Create separate `gp_read_persistent_error_log` function to read persistent error log. If the external table gets deleted, only the namespace owner has permission to delete the error log. Create separate `gp_truncate_persistent_error_log` function to delete persistent error log. If the external table gets deleted. Only the namespace owner has permission to delete the error log. It also supports wildcard input to delete error logs belong to a database or whole cluster. If drop an external table create with `error_log_persistent`. And then create the same "dbname"."namespace"."table" external table without persistent error log. It'll write errors to the normal error log. The persistent error log still exists. Reviewed-by: NHaozhouWang <hawang@pivotal.io> Reviewed-by: NAdam Lee <ali@pivotal.io>
-
- 17 3月, 2020 1 次提交
-
-
由 Heikki Linnakangas 提交于
There are two parts to this commit: 1. Instead of using ctids and "fake" ctids to disambiguate rows that have been duplicated by a Broadcast motion, generate a synthetic 64-bit rowid specifically for that purpose, just below the Broadcast. The old method had to generate subplan IDs, fake ctids etc. to have a combination of columns that formed the unique ID, and that entailed a lot of infrastructure, like the whole concept of "pseudo columns". The new method is much simpler, and it works the same way regardless of what's below the Broadcast. The new 64-bit rowids are generated by a new RowIdExpr expression node. The high 16 bits of the rowid are the segment ID, and the low bits are generated by a simple counter. 2. Replace the cdbpath_dedup_fixup() post-processing step on the Plan tree, by adding the unique ID column (now a RowIdExpr) in the Path's pathtarget in the first place. cdbpath_motion_for_join() is the function that does that now. Seems like a logical place, since that's where any needed Motions are added on top of the join inputs. The responsibility for creating the Unique path is moved from cdb_add_join_path() to the create_nestloop/hashjoin/mergejoin_path() functions. This slightly changes the plan when one side of the join has Segment or SegmentGeneral locus. For example: select * from srf() r(a) where r.a in (select t.a/10 from tab t); With this commit, that's implemented by running the FunctionScan and generating the row IDs on only one node, and redistributing/broadcasting the result. This is not exactly the same as before this PR: previously, if the function was stable or immutable, we assumed that a SRF returns rows in the same order on all nodes and relied on that when generating the "synthetic ctid" on each row. We no longer assume that, but force the function to run on a single node instead. That seems acceptable, this kind of plans are helpful when the other side of the join is much larger, so redistributing the smaller side is probably still OK. This also uses the same strategy when the inner side is a replicated table rather than a SRF. That was not supported before, but there was a TODO comment that hinted at the possibility. Fixes https://github.com/greenplum-db/gpdb/issues/9741, and adds tests for it. Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
-
- 09 3月, 2020 1 次提交
-
-
由 Heikki Linnakangas 提交于
Overview -------- Manage the allocation of AO segfiles in QE nodes, instead of the master. That simplifies a lot of things. We no longer need to send the decision on which segfile to use from QD to QE, we no longer need to keep up-to-date tuple counts in the master, and so forth. This gets rid of the in-memory hash table to track AO segment status. To lock a segfile for insertion or other operations, we now rely on tuple locks on the AO segment table instead. VACUUM code has been refactored. I tried to make the code structure closer to upstream, to reduce merge conflicts as we merge with upstream. The way the VACUUM phases are dispatched, and the phases needed, are different in the new regime. Some phases could use local transactions, but others have to still use distributed transactions, but for simplicity we use distributed transactions for all of the phases. Now that QE nodes choose the insertion target segfile independently, we could use the same algorithm for choosing the target segfile in utility mode, and wouldn't need to reserve segfile #0 for special operations anymore. But to keep things familiar, and to avoid any more churn in the regression tests, this keeps the historical behavior and still reserves segfile #0 for utility mode and CREATE TABLE AS. Previously, SERIALIZABLE (and REPATABLE READ) transactions were treated differently from transactions in READ COMMITTED mode. If there was an old serializable transaction running, VACUUM refrained from recycling old AO segfiles after compaction, because the serializable transaction might lock the table later and try to read the rows that were compacted away. But that was not completely waterproof; a READ COMMITTED transaction that's still holding onto an old snapshot is no different from a serializable transaction. There's a gap between acquiring a snapshot and locking a table, so even if a READ COMMITTED backend is not holding a lock on a table right now, it might open it later. To fix that, remove the distinction between serializable and other transactions. Any open transaction will now prevent AO segfiles in awaiting-drop state from being removed. This commit adds an isolation2 test case called 'vacuum_self_function' to test one such case; it used to fail before this commit. That is a user-visible change in behavior that's worth calling out explicitly. If there are concurrent transactions running, even in READ COMMITTED mode, and even if they don't access the table at all, VACUUM will no longer be able to recycle compacted segfiles. Next VACUUM will clean them up, after the old concurrent transactions. Meanwhile, the awaiting-drop segfiles will use up disk space, and increase the risk that an insertion fails because all segfiles are in use. This work was originally submitted github PR #790; you might see references to that in the discussions. Code change details ------------------- This gets rid of the in-memory hash table to track AO segment status. To lock a segfile for insertion or other operations, we now rely on tuple locks on the AO segment table instead. When an AO segment is chosen as insertion target, its AO segment table row is locked for the transaction (by heap_lock_tuple, i.e. by stamping its XMAX). Later, when the insertion finishes, the tuple is updated with the new EOF (no change there). When choosing a new insertion target segfile, we scan the pg_aoseg table instead of the hash table, and use the xmin/xmax to determine which ones are safe to use. The process of choosing a target segment itself is not concurrency-safe, so the relation extension lock is held to protect that. See new "Locking and snapshots" section in src/backend/access/appendonly/README.md for details. tupcount and modcount are not kept up-to-date in the master. Remove all the code needed to send the totals back from QE to QD at end of a command, including the GPDB-specific 'o' protocol FE/BE message. Remove the machinery to track which PROCs are in a serializable transaction. It was not completely waterproof anyway, and there were some existing bugs in that area. A READ COMMITTED transaction that's still holding onto an old snapshot is no different from a serializable transaction. This commit adds an isolation2 test case called 'vacuum_self_function' to test one such case; it used to fail before this commit. Move responsibility of creating AO segfile to callers of appendonly_insert_init(). The caller must have locked, and created if necessary, the segfile by calling ChooseSegnoForWrite() or LockSegnoForWrite(). Previously, the pg_aoseg was not locked in rewriting ALTER TABLE and other such commands that always used segfile #0. That was OK, because they held an AccessExclusiveLock on the table, but let's be consistent. When selecting the insertion target for VACUUM compaction, prefer creating a new segfile over using an existing non-empty segfile. Existing empty segfiles are still choice #1, but let's avoid writing to non-empty existing files, because otherwise we cannot vacuum those segfiles in the same VACUUM cycle. Warn, if compaction fails because no insertion segfile could be chosen. This used to be an ERROR in the old code, but I think a WARNING is more appropriate. In targetid_get_partition(), don't create duplicate ResultRelInfo entries. It's possible that the "result relation" is a partition of a partitioned table, rather than the root. Don't create a new ResultRelInfo hash entry for it in that case, use the parent ResultRelInfo. This was harmless before, but now the logic of when we need to increment the modcount was getting confused. Per the 'qp_dropped_cols' regression test. Test change details ------------------- src/test/regress/greenplum_schedule Run uao[cs]_catalog_tables tests separately. Because we are now more lazy with dropping AWAITING_DROP segments, the issue mentioned in the comment with serializable transactions now applies to all transactions. uao_compaction/threshold uaocs_compaction/threshold Adjust uao[cs]_compaction/threshold tests case to make it pass. In the first few changed outputs, I'm not entirely sure why one segment chooses to insert the new tuples to segfile #2 while others pick segfile #1, but there's also nothing wrong with that. The point of this PR is that QE nodes can make the decision independently. In the last test with gp_toolkit.__gp_aovisimap_compaction_info(), all the test data is inserted to one QE node, so segfile 2 isn't allocated on others. isolation2/vacuum_self_function Add test case for the non-SERIALIZABLE case discussed earlier. See also discussion at https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/49MW3tBoH8s/W_bdyxZMAgAJ This test errored out before this commit: ERROR: read beyond eof in table "ao" file "base/16384/16387.1", read position 0 (small offset 0), actual read length 0 (large read length 192) (cdbbufferedread.c:211) Works with this PR, but we don't drop old AO segments as aggressively in vacuum anymore. Partly because we're now more careful and correct. But also because we don't detect the case that the VACUUM is the oldest transaction in the system, and nothing else is running; we miss the opportunity to drop compacted AO segments immediately in that case. The next VACUUM should drop them though. Furthermore, this test is written so that if we tried to perform AO compaction phase in a local rather than distributed transaction, it would fail. (I know because I tried to do it that way at first.) This is "Problem 3" that I mentioned on that gpdb-dev thread. isolation2/fsync_ao Adjust "hit counts" in fsync_ao test. The number of times the fsyncs happen have changed. To be honest, I'm not sure why, nor what the principled way to get the correct value would be, but as long as the counts are > 0, this looks like success to me. isolation2/gdd/planner_insert_while_vacuum_drop Remove GDD test that's not relevant anymore. The test was testing a deadlock between QD and QE, involving AO vacuum. We no longer take an AccessExclusiveLock in the QD, so the test case doesn't work. I'm not sure what an equivalent deadlock might be with the new AO vacuum, so just remove the test. isolation2/vacuum_drop_phase_ao isolation2/uao/insert_should_not_use_awaiting_drop These tests don't make sense in the QE-managed regime anymore. I couldn't figure out what these should look like with this commit, so remove. isolation2/uao/insert_policy Update test case to cover a scenario where two transactions try to create the initial segment for relation. We had a bug in that scenario during development, and it wasn't covered by any existing tests. isolation2/uao/select_while_vacuum_serializable2 The behavior is different now, a serializable transaction doesn't prevent compaction phase from happening, only the recycling of the segfile. Adjust expected output accordingly. isolation2/uao/compaction_utility VACUUM in utility mode can compact now. Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io> Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io> Co-authored-by: NAsim R P <apraveen@pivotal.io> Co-authored-by: NXin Zhang <xzhang@pivotal.io> Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io> Reviewed-by: NTaylor Vesely <tvesely@pivotal.io> Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/49MW3tBoH8s/W_bdyxZMAgAJ Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/lUwtNxaT3f0/Cg39CP8uAwAJ
-
- 04 3月, 2020 1 次提交
-
-
由 Abhijit Subramanya 提交于
The relation statistics object stores a flag which indicates if a relation is empty or not. However this was being set based on the value of stats_empty flag. This flag was being set in the function `cdb_estimate_partitioned_numtuples`. It would be set to true if any of the child partitions had their relpages set to 0. But if other partitions had data in them, then the relation should not be considered empty. We should actually be checking for the total number of rows to determine if a partition table is empty or not. Also the stats_empty flag was not being used for anything else so this commit removes it.
-
- 31 1月, 2020 3 次提交
-
-
由 Heikki Linnakangas 提交于
This makes external tables less of a special case throughout the planner and executor. They're now mostly handled through the FDW API callbacks. * Add a FDW handler function and implement the API routines to construct ForeignScan paths and Plan nodes, iterate through a scan, and to insert tuples to the external/foreign table. The API routines just call through to the existing external table functions, like external_getnext() and external_insert(). * Remove ExternalScan plan node in the executor and the ExternalScan path node from the planner. Use ForeignScan plan and path nodes instead, like for any other FDW. Move code related to external table planning to a new exttable_fdw_shim.c file. * The parameters previously carried in the ExternalScan struct are now in a new ExternalScanInfo struct. (Or, the ExternalScan struct has been renamed to ExternalScanInfo, if you want to think of it that way.) It's not a plan node type anymore, but it still needs read/out function support so that the parameters can be serialized. Alternatively, the parameters could be carried in Lists of Values and other existing expression types, but a struct seems easier to handle. Perhaps the cleanest solution would be to use the ExtensibleNode infrastructure for it, but I'll leave that for another patch. As long as the external table FDW is in the backend anyway, it's simplest to have the out/read/copy funcs built-in as well. * Modify ForeignScan executor node so that it can make up "fake ctids" like ExternalScan did, and also add "squelching" support to it. * Remove special handling of external tables from ModifyTable node. It now uses the normal FDW routines for it. COPY still calls directly into the external_insert() function, because PostgreSQL doesn't support COPY into a foreign table until version 11. (We don't seem to have any tests for COPY TO/FROM an external table. TODO: add test.)
-
由 Heikki Linnakangas 提交于
External tables now use relkind='f', like all foreign tables. They have an entry in pg_foreign_table, as if they belonged to a special foreign server called "exttable_server". That foreign server gets special treatment in the planner and executor, so that we still plan and execute it the same as before. * ALTER / DROP EXTERNAL TABLE is now mapped to ALTER / DROP FOREIGN TABLE. There is no "OCLASS_EXTTABLE" anymore. This leaks through to the user in error messages, e.g: postgres=# drop external table boo; ERROR: foreign table "boo" does not exist and to the command tag on success: postgres=# drop external table boo; DROP FOREIGN TABLE * psql \d now prints external tables as Foreign Tables. Next steps: * Use the foreign table API routines instead of special casing "exttable_server" everywhere. * Get rid of the pg_exttable table, and store the all the options in pg_foreign_table.ftoptions instead. * Get rid of the extra fields in pg_authid to store permissions to create different kinds of external tables. Store them as ACLs in pg_foreign_server.
-
由 Heikki Linnakangas 提交于
In PostgreSQL, ALTER TABLE is allowed where ALTER FOREIGN TABLE is. Allow external tables the same leniency.
-
- 14 1月, 2020 1 次提交
-
-
由 Heikki Linnakangas 提交于
Commit 25a7630e silenced printing a lot of spurious "time constraints added on superuser rule" warnings from the log. But there was also one such warning memorized in the expected output, which was overlooked by that commit. Remove the warning. The warning was memorized in the expected output in the 9.0 merge (commit 5ec5c30a), but it should never have been printed. Making a role superuser with "ALTER ROLE <role> SUPERUSER" drops all pg_auth_time_constraint entries for the role, so this check_auth_time_constraints() call shouldn't print such a warning.
-
- 11 1月, 2020 1 次提交
-
-
由 Heikki Linnakangas 提交于
* Build a preliminary "planner slice table" at the end of planning, and attach it to the PlannedStmt. Executor startup turns that into the final executor slice table. This replaces the step where executor startup scanned the whole Plan tree build the slice table. * Now that the executor startup gets a pre-built planner slice table, it doesn't need the Flow structures for building the slice table anymore. Also refactor the few other remaining places in nodeMotion.c and nodeResult.c that accessed the Flows to use the information from the slice table instead. The executor no longer looks at the Flows at all, so we don't need to include them in the serialized plan tree anymore. ORCA translator doesn't need to build Flow structures anymore either. Instead, it now builds the planner slice table like the Postgres planner does. * During createplan.c processing, keep track of the current "slice", and attach direct dispatch and other per-slice information to the PlanSlice struct directly, instead of carrying it in the Flow structs. This renders the Flows mostly unused in the planner too, but there is still one thing we use the Flows for: to figure out when we need to add a Motion on top of a SubPlan's plan tree, to make the subplan's result available in the slice where the SubPlan is evaluated. There's a "sender slice" struct attached to a Motion during create_plan() processing to represent the sending slice. But the slice ID is not assigned at that stage yet. Motion / Slice IDs are assigned later, when the slice table is created. * Only set 'flow' on the topmost Plan, and in the child of a Motion. * Remove unused initplans and subplans near the end of planning, after set_plan_references(), but before building the planner slice table. We used to remove init plans and subplans a little bit earlier, before set_plan_references(), but there was at least one corner case involving inherited tables where you could have a SubPlan referring a subplan in the plan tree before set_plan_references(), but set_plan_references() transformed it away. You would end up with an unused subplan in that case, even though we previously removed any unused subplans. This way we don't need to deal with unused slices in the executor. * Rewrite the logic to account direct-dispatch cost saving in plan cost. Reviewed-by: NTaylor Vesely <tvesely@pivotal.io>
-
- 03 1月, 2020 2 次提交
-
-
由 Heikki Linnakangas 提交于
Previously, this was allowed: postgres=# ALTER TABLE parttab DROP PARTITION; ALTER TABLE It dropped the first partition, by rank. In other words, it was a shorthand for: ALTER TABLE parttab DROP PARTITION FOR (RANK (1)); We don't need such a shorthand. It's too much of a footgun; if a user wants to drop partition, he should name it properly. The documentation doesn't mention this syntax at all, so no docs changes required. Reviewed-by: NAsim R P <apraveen@pivotal.io>
-
由 Ashwin Agrawal 提交于
alter_db_set_tablespace test has scenarios to inject error fault for content 0. Then run ALTER DATABASE SET TABLESPACE command. Once error is hit on content 0, the transaction is aborted. Based on when the transaction gets aborted, its unpredictable what point the command has reached for non-content 0 primaries. If non-content 0 primaries, have reached the point of directory copy, then only abort record for them will have database directory deletion record to be replayed on mirror, else not. The test was waiting for directory deletion fault to be triggered for all the content mirrors. This expectation is incorrect and makes test flaky based on timing. Hence, modifying the test for error scenarios to only wait for directory deletion for content 0. Then wait for all the mirrors to replay all the currently generated wal records, post which make sure destination directory is empty. This should eliminate the flakiness from the test. Reviewed-by: NAsim R P <apraveen@pivotal.io>
-
- 31 12月, 2019 1 次提交
-
-
由 Ashwin Agrawal 提交于
Test should make sure mirror has processed the drop database wal record before proceeding to perform check for destination tablespace directory non-existence. It skipped performing the wait for content 0, incase of panic after writing wal record, that's incorrect. Adding to wait for all the mirror's to process the wal record and then only perform the validation. This should fix the failures seen in CI with below diff ``` --- /tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/alter_db_set_tablespace.out 2019-10-14 16:09:43.638372174 +0000 +++ /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/alter_db_set_tablespace.out 2019-10-14 16:09:43.714379108 +0000 @@ -1262,25 +1271,352 @@ CONTEXT: PL/Python function "stat_db_objects" NOTICE: dboid dir for database alter_db does not exist on dbid = 4 CONTEXT: PL/Python function "stat_db_objects" -NOTICE: dboid dir for database alter_db does not exist on dbid = 5 -CONTEXT: PL/Python function "stat_db_objects" NOTICE: dboid dir for database alter_db does not exist on dbid = 6 CONTEXT: PL/Python function "stat_db_objects" NOTICE: dboid dir for database alter_db does not exist on dbid = 7 CONTEXT: PL/Python function "stat_db_objects" NOTICE: dboid dir for database alter_db does not exist on dbid = 8 CONTEXT: PL/Python function "stat_db_objects" - dbid | relfilenode_dboid_relative_path | size -------+---------------------------------+------ - 1 | | - 2 | | - 3 | | - 4 | | - 5 | | - 6 | | - 7 | | - 8 | | -(8 rows) + dbid | relfilenode_dboid_relative_path | size +------+---------------------------------+-------- + 1 | | + 2 | | + 3 | | + 4 | | + 5 | 180273/112 | 32768 + 5 | 180273/113 | 32768 + 5 | 180273/12390 | 65536 + 5 | 180273/12390_fsm | 98304 <....choping output as very long...> + 5 | 180273/PG_VERSION | 4 + 5 | 180273/pg_filenode.map | 1024 + 6 | | + 7 | | + 8 | | +(337 rows) ``` Reviewed-by: NAsim R P <apraveen@pivotal.io>
-
- 25 12月, 2019 1 次提交
-
-
由 Ning Yu 提交于
The alter_db_set_tablespace test has been flaky for a long time, one typical failure is like this: --- /regress/expected/alter_db_set_tablespace.out +++ /regress/results/alter_db_set_tablespace.out @@ -1204,21 +1213,348 @@ NOTICE: dboid dir for database alter_db does not exist on dbid = 2 NOTICE: dboid dir for database alter_db does not exist on dbid = 3 NOTICE: dboid dir for database alter_db does not exist on dbid = 4 -NOTICE: dboid dir for database alter_db does not exist on dbid = 5 NOTICE: dboid dir for database alter_db does not exist on dbid = 6 NOTICE: dboid dir for database alter_db does not exist on dbid = 7 NOTICE: dboid dir for database alter_db does not exist on dbid = 8 The test disables fts probing with fault injection, however it does not wait for the fault to be triggered. The other problem is that the fts probing was disabled after the PANIC, that might not be in time. So the problem was that we were having a scenario where we were injecting the fault after the fts loop was beyond the fault point and then when the subsequent PANIC was caused, fts was still active. By manually triggering, and then by waiting to ensure that the fault is hit at least once, we can guarantee that the scenario described above doesn't happen. Reviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io> Reviewed-by: NTaylor Vesely <tvesely@pivotal.io> Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
-
- 23 12月, 2019 2 次提交
-
-
由 Heikki Linnakangas 提交于
Exclusion constraints can be enforced if the table is replicated, or if the distribution key columns are part of the constraint, with the same = operator as in the distribution key's hash opclass. This is analogous to the similar restrictions on unique indexes. Exclusion constraints on partitioned tables would have similar restrictions, but because of the way creation of partitioned tables currently works, checking for those restrictions would be tricky. A partitioned table is only marked as partitioned after the indexes have been created, so we cannot check for compatibility with partitioning keys during index creation, like we can with distribution keys. Refactoring creation of partitioned tables is more than I can chew right now, so for now, continue to disallow exclusion constraints on partitioned tables. Reviewed-by: NZhenghua Lyu <zlv@pivotal.io> Reviewed-by: NKalen Krempely <kkrempely@pivotal.io>
-
由 Heikki Linnakangas 提交于
We had checks that a unique/primary key index must contain all the distribution key columns. Otherwise, rows with duplicate index keys might be stored on different segments, and we would fail to enforce the uniqueness. But the checks didn't pay any attention to operator classes. If the index's operator class had a different idea of what "equal" means, then you might have two rows that are identical according to the index operator class, but different according to the distribution key hash operator class, and be stored on different segments. Modify the checks to also check that the operator classes use the same equals operator. Refactor so that we can share the code doing the checks for both CREATE INDEX and ALTER TABLE SET DISTRIBUTED BY. Fixes https://github.com/greenplum-db/gpdb/issues/6971
-
- 04 12月, 2019 2 次提交
-
-
由 Chen Mulong 提交于
When the hostname cannot be resolved, getDnsAddress can return NULL which should be checked before using. Authored-by: NChen Mulong <muchen@pivotal.io>
-
由 Adam Lee 提交于
There is one after and with correct delim_off checking. And take the chance to tidy the ProcessCopyOptions(), align with upstream, handle delim_off.
-
- 28 11月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
In the common case that the distribution key is one of the first columns in a table, COPY FROM can scale a lot better, if we only parse, and call the input functions, for the first columns in the QD, and delay parsing the rest of the into the QEs. The overall effort is the same, but the QD can easily become a bottleneck. GPDB 5 did this, but we lost in the refactoring of the COPY code in the 9.1 merge. This brings it back. Update answer file for test sreh. Master would reject tuple "bad|8|8" and then second tuple it will see bad will be "eleven". Hence, reject that. I think previously since master was parsing full tuples it will see incorrect data for non-distribution columns also. Hence, reported error for earlier tuple. Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io> Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
-
- 26 11月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
Seems less surprising. Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io> Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
-
- 25 11月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
Fixes github issue https://github.com/greenplum-db/gpdb/issues/9022Reviewed-by: NAsim R P <apraveen@pivotal.io>
-
- 08 11月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
-
- 30 10月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
We had an old GPDB_84_MERGE_FIXME comment about whether the behavior of that was correct. So I decided to finally see what happens. The behavior seems reasonable to me, so just add a few tests for it and remove the FIXME. Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
-
- 29 10月, 2019 1 次提交
-
-
由 Soumyadeep Chakraborty 提交于
This also applies the fix in efd76c4c to all callers of force_mirrors_to_catch_up(), thus eliminating flakes such as: ``` -- Ensure that the mirrors including the standby master have removed the dboid dir under the target tablespace. SELECT gp_wait_until_triggered_fault('after_drop_database_directories', 1, dbid) FROM gp_segment_configuration WHERE role='m'; - gp_wait_until_triggered_fault -------------------------------- - Success: - Success: - Success: - Success: -(4 rows) - +ERROR: failed to inject fault: ERROR: fault not triggered, fault name:'after_drop_database_directories' fault type:'wait_until_triggered' (gp_inject_fault.c:127) +DETAIL: DETAIL: Timed-out as 10 minutes max wait happens until triggered. ```
-
- 28 10月, 2019 1 次提交
-
-
由 Paul Guo 提交于
1. Do not call elog(FATAL) in cleanupQE() since it could be called in cdbdisp_destroyDispatcherState() to destroy CdbDispatcherState. This leads to reentrance of cdbdisp_destroyDispatcherState() which is not supported. Changing the code to return false instead and to sanity check the reentrance. Returning false should be ok since that leads to gang destroying and thus QE resources should be destroyed themselves. Here is a typical stack of reentrance. 0x0000000000b8ffeb in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:345 0x0000000000b90385 in cleanup_dispatcher_handle (h=0x2eff0d8) at cdbdisp.c:488 0x0000000000b904c0 in cdbdisp_cleanupDispatcherHandle (owner=0x2e80de0) at cdbdisp.c:555 0x0000000000b27fb7 in CdbResourceOwnerWalker (owner=0x2e80de0, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1375 0x0000000000b27fd8 in CdbResourceOwnerWalker (owner=0x2f30358, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1379 0x0000000000b903d9 in AtAbort_DispatcherState () at cdbdisp.c:511 0x000000000053b8ab in AbortTransaction () at xact.c:3319 0x000000000053e057 in AbortOutOfAnyTransaction () at xact.c:5248 0x00000000005c6869 in RemoveTempRelationsCallback (code=1, arg=0) at namespace.c:4088 0x000000000093c193 in shmem_exit (code=1) at ipc.c:257 0x000000000093c088 in proc_exit_prepare (code=1) at ipc.c:214 0x000000000093bf86 in proc_exit (code=1) at ipc.c:104 0x0000000000adb6e2 in errfinish (dummy=0) at elog.c:754 0x0000000000ade465 in elog_finish (elevel=21, fmt=0xe847c0 "cleanup called when a segworker is still busy") at elog.c:1735 0x0000000000beca81 in cleanupQE (segdbDesc=0x2ee9048) at cdbutil.c:846 0x0000000000becbc8 in cdbcomponent_recycleIdleQE (segdbDesc=0x2ee9048, forceDestroy=0 '\000') at cdbutil.c:871 0x0000000000b9815a in RecycleGang (gp=0x2eff7f0, forceDestroy=0 '\000') at cdbgang.c:861 0x0000000000b9009e in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:372 0x0000000000b96957 in CdbDispatchCopyStart (cdbCopy=0x2f23828, stmt=0x2e364d0, flags=5) at cdbdisp_query.c:1442 2. Force to drop the reader gang for named portal if set command happens previously since that setting was not dispatched to that gang and thus we should not reuse them. 3. Now that we have the mechanism of destroying DispatcherState in resource owner callback when aborting transaction. It is not needed to destroy in some dispatcher code. The added test cases and some existing test cases cover almost all code change except the change in cdbdisp_dispatchX() (I can not find a solution to test this, and I'll keep it in my mind to see how to test that or similar code). Reviewed-by: Pengzhou Tang Reviewed-by: Asim R P
-
- 23 10月, 2019 1 次提交
-
-
由 Jinbao Chen 提交于
The commit b3b2797e adds an alias appendoptimized for appendonly. Making changes on the partition table was missed. Add it back.
-
- 09 10月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
The old way was to construct a plan for a correlated subquery was to plan the subquery as usual, except that Index Scans were not allowed. Then, after constructing a Plan tree, the post-processing step apply_motion() added Redistribute motion nodes so that the plan tree was executable on any node. That was pretty simplistic; disabling Index Scans completely obviously hurts performance. Also, because the planner didn't take into account that all base relations are actually redistributed everywhere, it was not reflected in the cost estimates, and the planner might choose a sub-optimal plan. To improve that, change the way that works, so that the Motions are added earlier in the planning. The planner needs to know which restrictions (WHERE clauses) refer to the outer query, i.e. which quals are correlated, and make sure that those quals are always evaluated in the same slice as the parent query. The outer query cannot pass parameters down through a Motion node, so the subquery plan must not contain any Motion nodes between the evaluation of the correlated variable, and the outer plan. This is enforced by having a new CdbPathLocus type, "OuterQuery". A node with OuterQuery locus must be evaluated in the outer query. It is mostly the same as "general", which means that it can be evaluated anywhere, but with the restriction that it is not OK to redistribute an input that has OuterQuery locus. Whenever the planner node evaluates a correlated var, that node must have Upper locus, by adding Motions below that node. This has similar effect as the old approach, but gives the planner a bit more flexibility, and the motions are taken into account in cost estimates. This allows using Index Scans in subplans, but only if the Index Quals don't contain correlated vars. This still isn't perfect, it would sometimes be good to for example delay the evaluation of a correlated var later, above a join node, because that might avoid expensive Redistribute Motions. Even though this patch doesn't allow such plans yet, it's a step in the right direction. This moves the cdbllize() step to run *before* set_plan_references(). Now that we no longer add Motion nodes to an already-constructed plan tree, we don't need the 'useExecutorVarFormat' stuff in many functions anymore. Fixes https://github.com/greenplum-db/gpdb/issues/8648Reviewed-by: NMelanie Plageman <mplageman@pivotal.io> Reviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>
-
- 04 10月, 2019 1 次提交
-
-
由 Asim R P 提交于
Spotted when analyzing a CI failure pertaining to another test. Reviewed by Heikki and Georgios.
-
- 02 10月, 2019 2 次提交
-
-
由 Bhuvnesh Chaudhary 提交于
-
由 Bhuvnesh Chaudhary 提交于
Earlier, COPY <Catalogtable> from <file> was allowed irrespective of the value of allow_system_table_mods. This commit restricts such statements only when allow_system_table_mods is set to ON. Co-Authored-By: Ashwin Agrawal<aagrawal@pivotal.io>
-
- 26 9月, 2019 1 次提交
-
-
由 Ashwin Agrawal 提交于
Current code for COPY FROM picks mode as COPY_DISPATCH for non-distributed/non-replicated table as well. This causes crash. It should be using COPY_DIRECT, which is normal/direct mode to be used for such tables. The crash was exposed by following SQL commands: CREATE TABLE public.heap01 (a int, b int) distributed by (a); INSERT INTO public.heap01 VALUES (generate_series(0,99), generate_series(0,98)); ANALYZE public.heap01; COPY (select * from pg_statistic where starelid = 'public.heap01'::regclass) TO '/tmp/heap01.stat'; DELETE FROM pg_statistic where starelid = 'public.heap01'::regclass; COPY pg_statistic from '/tmp/heap01.stat'; Important note: Yes, it's known and strongly recommended to not touch the `pg_statistics` or any other catalog table this way. But it's no good to panic either. The copy to `pg_statictics` is going to ERROR out "correctly" and not crash after this change with `cannot accept a value of type anyarray`, as there just isn't any way at the SQL level to insert data into pg_statistic's anyarray columns. Refer: https://www.postgresql.org/message-id/12138.1277130186%40sss.pgh.pa.us
-