- 12 3月, 2019 1 次提交
-
-
由 Georgios Kokolatos 提交于
The function was using an anti-pattern where an argument is a static length char array. Array arguments in C don't really exist but compilers accept them and without any 'const' or 'static' decorators they don't always emit a warning. The proposed patch only fixes the unsafeness of the function declaration but it does not address the usefulness of it. Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
-
- 05 3月, 2019 1 次提交
-
-
由 Pengzhou Tang 提交于
Commit 4eb65a53 bring GPDB the ability to allow tables to be distributed on a subset of segments, that commit taken care of the replicated table well for SELECT/UPDATE/INSERT on a subset, however, COPY FROM was not. For COPY FROM replicated table, only one replica should be picked to provide data and the QE whose segid match gp_session_id % segment_size will be chosen, obviously, for table on a subset of segments, a valid QE might be chosen. To fix it, the real numsegments of replicated table should be used instead of current segment size, what's more, dispatcher can allocate gangs on a subset of segments now, QD can directly allocate picked gang directly, QE doesn't need to care about whether it should provide data anymore. For COPY TO replicated table, we also should allocated correct QEs matching the numsegments of replicated table. Co-authored-by: NGang Xiong <gxiong@pivotal.io>
-
- 13 2月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
-
- 14 1月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
QE processes largely don't care about client_encoding, because query results are sent through the interconnect, except for a few internal commands, and the query text is presumed to already be in the database encoding, in QD->QE messages. But there were a couple of cases where it mattered. Error messages generated in QEs were being converted to client_encoding, but QD assumed that they were in server encoding. Now that the QEs don't know the user's client_encoding, COPY TO needs changes. In COPY TO, the QEs are responsible for forming the rows in the final cilent_encoding, so the QD now needs to explicitly use the COPY's ENCODING option, when it dispatches the COPY to QEs. The COPY TO handling wasn't quite right, even before this patch. It showed up as regression failure in the src/test/mb/mbregress.sh 'sjis' test. When client_encoding was set with the PGCLIENTENCODING, however, it wasn't set correctly in the QEs, which showed up as incorrectly encoded COPY output. Now that we always set it to match the database encoding in QEs, that's moot. While we're at it, change the mbregress test so that it's not sensitive to row orderings. And make atmsort.pm more lenient, to recognize "COPY <tablename> TO STDOUT", even when the tablename contains non-ascii characters. These changes were needed to make the src/test/mb/ tests pass cleanly. Fixes https://github.com/greenplum-db/gpdb/issues/5241. Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/WPmHXuU9T94/gvpNOE73FwAJReviewed-by: NPengzhou Tang <ptang@pivotal.io>
-
- 11 1月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
We cleared the local 'useHeapMultiInsert' variable, as soon as we saw at least one AO partition, even if we had already buffered tuples for a heap partition earlier. As a result, we didn't flush the multi-insert buffer at the end of the COPY. Fixes https://github.com/greenplum-db/gpdb/issues/6678Reviewed-by: NAdam Lee <ali@pivotal.io>
-
- 25 12月, 2018 1 次提交
-
-
由 David Kimura 提交于
* Make SendTuple do de-toasting for heap as well as memtuples Motion node didn't consistently handle where MemTuples and HeapTuples are de-toasted. This change delays de-toasting of MemTuple in motion sender until SerializeTupleDirect(), which is where HeapTuples are de-toasted. This change also corrects a bug in de-toasting of memtuples in motion sender. De-toasting of memtuples marked the slot containing the de-toasted memtuple as virtual. When a TupleTableSlot is marked virtual, its values array should point to addresses within the memtuple. The bug was that the values array was populated *before* de-toasting and the memory used by the toasted memtuple was freed. Subsequent usages of this slot suffered from dereferencing invalid memory through the pointers in values array. This change corrects the issue by simplifying ExecFetchSlotMemTuple() to no longer manage memory required to de-toast which and also aligns the code structure more similar to ExecFetchSlotMinimalTuple() in Postgres. An alternative solution that we considered is to continue de-toasting in ExecFetchSlotMemTuple() and clear the virtual bit before freeing the tuple. A disadvantage of that approach is that it modifies an existing slot and complicates the contract of memtuple and virtual table slots. Also, it doesn't match with where HeapTuples are de-toasted so the contract becomes less clear. The bug is hard to discover because motion node in executor is the only place where a memtuple is de-toasted, before sending it over interconnect. Only a few plan nodes make use of a tuple that's already returned to the parent nodes e.g. Unique (DISTINCT ON()) and SetOp (INTERSECT / EXCEPT). Example plan that manifests the bug: Gather Motion 3:1 Merge Key: a, b -> Unique Group By: a, b -> Sort Sort Key (Distinct): a, b -> Seq Scan The Unique node uses the previous tuple de-toasted by the Gather Motion sender to compare with the new tuple obtained from the Sort node for eliminating duplicates. Co-authored-by: NAsim R P <apraveen@pivotal.io>
-
- 24 12月, 2018 1 次提交
-
-
由 Adam Lee 提交于
``` >>> Using uninitialized value "useHeapMultiInsert". 4130 if (useHeapMultiInsert) ``` Also, refine the lines to make it clear.
-
- 15 12月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Make both of them to return a bool. Now they're consistent, and a bool seems most useful for the callers. Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io> Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
-
- 14 12月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
-
- 13 12月, 2018 1 次提交
-
-
由 Daniel Gustafsson 提交于
The Greenplum specific error handling via ereport()/elog() calls was in need of a unification effort as some parts of the code was using a different messaging style to others (and to upstream). This aims at bringing many of the GPDB error calls in line with the upstream error message writing guidelines and thus make the user experience of Greenplum more consistent. The main contributions of this patch are: * errmsg() messages shall start with a lowercase letter, and not end with a period. errhint() and errdetail() shall be complete sentences starting with capital letter and ending with a period. This attempts to fix this on as many ereport() calls as possible, with too detailed errmsg() content broken up into details and hints where possible. * Reindent ereport() calls to be more consistent with the common style used in upstream and most parts of Greenplum: ereport(ERROR, (errcode(<CODE>), errmsg("short message describing error"), errhint("Longer message as a complete sentence."))); * Avoid breaking messages due to long lines since it makes grepping for error messages harder when debugging. This is also the de facto standard in upstream code. * Convert a few internal error ereport() calls to elog(). There are no doubt more that can be converted, but the low hanging fruit has been dealt with. Also convert a few elog() calls which are user facing to ereport(). * Update the testfiles to match the new messages. Spelling and wording is mostly left for a follow-up commit, as this was getting big enough as it was. The most obvious cases have been handled but there is work left to be done here. Discussion: https://github.com/greenplum-db/gpdb/pull/6378Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io> Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
- 12 12月, 2018 2 次提交
-
-
由 Heikki Linnakangas 提交于
We store the distribution policy in segments nowadays (since commit 7efe3204). This is no longer needed. Reviewed-by: NAdam Lee <ali@pivotal.io>
-
由 Heikki Linnakangas 提交于
This code is performance-critical, as the QD will quickly become the bottleneck when loading data with COPY FROM. Optimizing it is therefore warranted. I did some quick testing on my laptop, loading a table with 50 boolean columns. This commit improved the performance of that by about 10%. In GPDB 5, we used to postpone calling the input functions for columns that were not needed to determine which segment to send the row to. We lost that optimization during the 9.1 merge, when the COPY code was heavily refactored. We now call the input functions for every column in the QD, which added a lot of overhead to the QD. On the other hand, some other things got faster. Before this commit, the test case with 50 booleans was about 10% slower in GPDB 6, compared to GPDB 5, so this buys back the performance, so that it's about the same speed again. I'm sure you can find a test case where GPDB 6 is still slower - you can make the input functions arbitrarily slow, after all - but every little helps. Reviewed-by: NAdam Lee <ali@pivotal.io>
-
- 07 12月, 2018 5 次提交
-
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
The 9.1 merge removed the code that incremented 'num_consec_csv_err'. Since then, it was only ever initialized to 0. Hence, all the related code was dead. I'm not sure how we behave with the kinds of errors that used to trigger this code now. But I don't see a need to treat them specially, the generic error handling code should cope with them. The GUC that controlled the max CSV line length was removed in commit 6ac29fdc already.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
They were only used to pass the information to cdbCopyStart(). Better to pass them directly as arguments.
-
由 Heikki Linnakangas 提交于
-
- 04 12月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
We can use the more straightforward cdbhashrandomseg() instead. Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
-
- 03 12月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
This was previously not implemented, because the gp_distribution_policy catalog was not populated in QE segments. Starting with commit 7efe3204, it is, so we can easily support this now. Reviewed-by: NXiaoran Wang <xiwang@pivotal.io> Reviewed-by: NAdam Lee <ali@pivotal.io>
-
- 29 11月, 2018 7 次提交
-
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
We might need it in the future again, but as it stands, it's just dead code. If we want to resurrect the optimization that needed it, we can put it back at that point. (I'm not 100% convinced it was correct, it looked a bit funky..)
-
由 Heikki Linnakangas 提交于
GetAttrContext was outright unused, and PartitionData was filled in and passed around, but not used for anything.
-
由 Heikki Linnakangas 提交于
'p_attr_types' and 'part_attr_types' fields are unused. 'p_nattrs' is redundant, you can look at policy->nattrs instead.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
Melanie pointed out that there is no existing test for this scenario. Also expand the comment in the code to explain this scenario.
-
由 Heikki Linnakangas 提交于
The control flow and the comment was pretty confusing. Rearrange it. Per comments from @melanieplageman and @dgkimura.
-
- 14 11月, 2018 1 次提交
-
-
由 Daniel Gustafsson 提交于
Erroring out via ereport(ERROR ..) will clean up resources allocated during execution, so explicitly freeing right before is not useful (unless the allocation is the in TopMemoryContext). Remove pfree() calls for lower allocations, and reorder one to happen just after a conditional ereport instead to make for slightly easier debugging when breaking on the error. Reviewed-by: NJacob Champion <pchampion@pivotal.io>
-
- 13 11月, 2018 1 次提交
-
-
由 Jinbao Chen 提交于
In ‘copy (select statement) to file’, we generate a query plan and set its dest receivor to copy_dest_receive. And run the dest receivor on QD. In 'copy (select statement) to file on segment', we modify the query plan, delete gather mothon, and let dest receivor run on QE. Change 'isCtas' in Query to 'parentStmtType' to be able to mark the upper utility statement type. Add a CopyIntoClause node to store copy informations. Add copyIntoClause to PlannedStmt. In postgres, we don't need to make a different query plan for the query in the utility stament. But in greenplum, we need to. So we use a field to indicate whether the query is contained in utitily statemnt, and the type of utitily statemnt. Actually the behavior of 'copy (select statement) to file on segment' is very similar to 'SELECT ... INTO ...' and 'CREATE TABLE ... AS SELECT ...'. We use distribution policy inherent in the query result as the final data distribution policy. If not, we use the first clomn in target list as the key, and redistribute. The only difference is that we used 'copy_dest_receiver' instead of 'intorel_dest_receiver'
-
- 06 11月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Avoids looking through domains, array types, etc. on every call. That seems like a more sensible API, since the data types don't change during the lifetime of a CdbHash. Make cdbhash() more convenient for callers, by handling NULLs within the function. This way the callers don't need to do the NULL check and call either cdbhash() or cdbhashnull(). This also fixes the performance issue caused by the syscache lookups reported in https://github.com/greenplum-db/gpdb/issues/5961. The type's type is now checked only once, when the CdbHash object is initialized, instead of every row. Reviewed-by: NMelanie Plageman <mplageman@pivotal.io> Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
-
- 05 11月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Notes in testcase about backslash escaping: - Need to add ESCAPE 'OFF' to COPY ... PROGRAM - echo will behaves differently on different platforms, force to use bash shell with -E option. Signed-off-by: NMing LI <liming01@gmail.com>
-
- 29 10月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
Most callers were passing CurrentMemoryContext, so this makes most callers slightly simpler. The few places that needed to pass a different context now switch to the correct one before calling the GpPolicy*() function. Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
-
- 25 10月, 2018 1 次提交
-
-
由 Tang Pengzhou 提交于
* Don't use GpIdentity.numsegments directly for the number of segments Use getgpsegmentCount() instead. * Unify the way to fetch/manage the number of segments Commit e0b06678 lets us expanding a GPDB cluster without a restart, the number of segments may changes during a transaction now, so we need to take care of the numsegments. We now have two way to get segments number, 1) from GpIdentity.numsegments 2) from gp_segment_configuration (cdb_component_dbs) which dispatcher used to decide the segments range of dispatching. We did some hard work to update GpIdentity.numsegments correctly within e0b06678 which made the management of segments more complicated, now we want to use an easier way to do it: 1. We only allow getting segments info (include number of segments) through gp_segment_configuration, gp_segment_configuration has newest segments info, there is no need to update GpIdentity.numsegments, GpIdentity.numsegments is left only for debugging and can be removed totally in the future. 2. Each global transaction fetches/updates the newest snapshot of gp_segment_configuration and never change it until the end of transaction unless a writer gang is lost, so a global transaction can see consistent state of segments. We used to use gxidDispatched to do the same thing, now it can be removed. * Remove GpIdentity.numsegments GpIdentity.numsegments take no effect now, remove it. This commit does not remove gp_num_contents_in_cluster because it needs to modify utilities like gpstart, gpstop, gprecoverseg etc, let's do such cleanup work in another PR. * Exchange the default UP/DOWN value in fts cache Previously, Fts prober read gp_segment_configuration, checked the status of segments and then set the status of segments in the shared memory struct named ftsProbeInfo->fts_status[], so other components (mainly used by dispatcher) can detect a segment was down. All segments were initialized as down and then be updated to up in most common cases, this brings two problems: 1. The fts_status is invalid until FTS does the first loop, so QD need to check ftsProbeInfo->fts_statusVersion > 0 2. gpexpand add a new segment in gp_segment_configuration, the new added segment may be marked as DOWN if FTS doesn't scan it yet. This commit changes the default value from DOWN to UP which can resolve problems mentioned above. * Fts should not be used to notify backends that a gpexpand occurs As Ashwin mentioned in PR#5679, "I don't think giving FTS responsibility to provide new segment count is right. FTS should only be responsible for HA of the segments. The dispatcher should independently figure out the count based on catalog.gp_segment_configuration should be the only way to get the segment count", FTS should decouple from gpexpand. * Access gp_segment_configuration inside a transaction * upgrade log level from ERROR to FATAL if expand version changed * Modify gpexpand test cases according to new design
-
- 19 10月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
If an error occurred in the segments, in a "COPY <table> TO <file>" command, the COPY was stopped, but the error was not reported to the user. That gave the false impression that it finished successfully, but what you actually got was an incomplete file. A test case is included. It uses a little helper output function that sometimes throws an error. Output functions are fairly unlikely to fail, but it could happen e.g. because of an out of memory error, or a disk failure. The "COPY (SELECT ...) TO <file>" variant did not suffer from this (otherwise, a query that throws an error would've been a much simpler way to test this.) The reason for this was that the code in cdbCopyGetData() that called PQgetResult(), and extracted the error message from the result, didn't indicate to the caller in any way that the error happened. To fix, delay the call to PQgetResult(), to a later call to cdbCopyEnd(). cdbCopyEnd() already had the logic to extract the error information from the PGresult, and throw it to the user. While we're at it, refactor cdbCopyEnd a little bit, to give the callers a nicer function signature. I also changed a few places that used 32-bit int to store rejected row counts, to use int64 instead. There was a FIXME comment about that. I didn't fix all the places that do that, though, so I moved the FIXME to one of the remaining places. Apply to master branch only. GPDB 5 didn't handle this too well, either; with the included test case, you got an error like this: postgres=# copy broken_type_test to '/tmp/x'; ERROR: missing error text That's not very nice, but at least you get an error, even if it's not a very good one. The code looks quite different in 5X_STABLE, so I'm not going to attempt improving that. Reviewed-by: NAdam Lee <ali@pivotal.io>
-
- 28 9月, 2018 1 次提交
-
-
由 ZhangJackey 提交于
There was an assumption in gpdb that a table's data is always distributed on all segments, however this is not always true for example when a cluster is expanded from M segments to N (N > M) all the tables are still on M segments, to workaround the problem we used to have to alter all the hash distributed tables to randomly distributed to get correct query results, at the cost of bad performance. Now we support table data to be distributed on a subset of segments. A new columne `numsegments` is added to catalog table `gp_distribution_policy` to record how many segments a table's data is distributed on. By doing so we could allow DMLs on M tables, joins between M and N tables are also supported. ```sql -- t1 and t2 are both distributed on (c1, c2), -- one on 1 segments, the other on 2 segments select localoid::regclass, attrnums, policytype, numsegments from gp_distribution_policy; localoid | attrnums | policytype | numsegments ----------+----------+------------+------------- t1 | {1,2} | p | 1 t2 | {1,2} | p | 2 (2 rows) -- t1 and t1 have exactly the same distribution policy, -- join locally explain select * from t1 a join t1 b using (c1, c2); QUERY PLAN ------------------------------------------------ Gather Motion 1:1 (slice1; segments: 1) -> Hash Join Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2 -> Seq Scan on t1 a -> Hash -> Seq Scan on t1 b Optimizer: legacy query optimizer -- t1 and t2 are both distributed on (c1, c2), -- but as they have different numsegments, -- one has to be redistributed explain select * from t1 a join t2 b using (c1, c2); QUERY PLAN ------------------------------------------------------------------ Gather Motion 1:1 (slice2; segments: 1) -> Hash Join Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2 -> Seq Scan on t1 a -> Hash -> Redistribute Motion 2:1 (slice1; segments: 2) Hash Key: b.c1, b.c2 -> Seq Scan on t2 b Optimizer: legacy query optimizer ```
-
- 06 9月, 2018 1 次提交
-
-
由 Heikki Linnakangas 提交于
It used to always say "COPY 0", instead of the number of rows copied. This source line was added in PostgreSQL 9.0 (commit 8ddc05fb), but it was missed in the merge. Add a test case to check the command tags of different variants of COPY, including this one.
-
- 05 9月, 2018 1 次提交
-
-
由 Jim Doty 提交于
The "unknown" type has an attlen of -2, which signifies that the actual length is determined by strlen(). We weren't handling this case, so handle it now. Co-authored-by: NJacob Champion <pchampion@pivotal.io>
-
- 15 8月, 2018 1 次提交
-
-
由 xiong-gang 提交于
* Remove ERRCODE_GP_FEATURE_NOT_SUPPORTED and use ERRCODE_FEATURE_NOT_SUPPORTED instead * Remove ERROR_INVALID_WINDOW_FRAME_PARAMETER and use ERRCODE_WINDOWING_ERROR instead Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NGang Xiong <gxiong@pivotal.io>
-
- 14 8月, 2018 1 次提交
-
-
由 Pengzhou Tang 提交于
Previously, COPY use CdbDispatchUtilityStatement directly to dispatch 'COPY' statements to all QEs and then send/receive data from primaryWriterGang, this way happens to work because primaryWriterGang is not recycled when a dispatcher state is destroyed. This seems nasty because the COPY command has finished logically. This commit splits the COPY dispatching logic to two parts to make it more reasonable.
-
- 03 8月, 2018 2 次提交
-
-
由 Daniel Gustafsson 提交于
The definitions of PQArgBlock in libpq.h and libpq-fe.h were in conflict with each other when including both. The definition in libpq.h was superfluous and removed in 23c7b583, so remove the redefines to clean up the code. Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
-
由 Karen Huddleston 提交于
This reverts commit 4750e1b6.
-