- 14 7月, 2017 4 次提交
-
-
由 Alexandra Wang 提交于
Signed-off-by: NJohn Gaskin <johntgaskin@gmail.com> Signed-off-by: NTodd Sedano <tsedano@pivotal.io>
-
由 Jimmy Yih 提交于
When a standby is shut down and restarted, WAL recovery starts from the last restartpoint. If we replay an AO write record which has a following drop record, the WAL replay of the AO write record will find that the segment file does not exist. To fix this, we piggyback on top of the heap solution of tracking invalid pages in the invalid_page_tab hash table. The hash table key struct uses a block number which, for AO's sake, we pretend is the segment file number for AO/AOCO tables. This solution will be revisited to possibly create a separate hash table for AO/AOCO tables with a proper key struct. Big thanks to Heikki for pointing out the issue.
-
由 Ashwin Agrawal 提交于
We generate AO XLOG records when --enable-segwalrep is configured. We should now replay those records on the mirror or during recovery. The replay is only performed for standby mode since promotion will not execute until after there are no more XLOG records to read from the WAL stream.
-
由 Heikki Linnakangas 提交于
As reported by @flochman. See github issue #2739.
-
- 13 7月, 2017 12 次提交
-
-
由 Heikki Linnakangas 提交于
Seems like a good thing to test. To avoid having to have separate ORCA and non-ORCA expected outputs, change the ORCA error message to match that you get without ORCA.
-
由 Daniel Gustafsson 提交于
This removes code which is either unreachable due to prior identical tests which break the codepath, or which is dead due to always being true. Asserting that an unsigned integer is >= 0 will always be true, so it's pointless. Per "logically dead code" gripes by Coverity
-
由 Jimmy Yih 提交于
When running `gpsegwalrep.py start`, it would intermittently deadlock on the subprocess.check_output call. Apparently, concurrent subprocess.check_output calls can deadlock depending on what shell commands are run and how fast they execute. For now, fix the issue by only calling subprocess.check_output under a thread lock. Someone can revisit this later although it is assumed a proper tool will be created in the near future.
-
由 Abhijit Subramanya 提交于
If we try to inject certain faults when the system is initialized with filerep disabled, we get the following error: ``` gpfaultinjector error: Injection Failed: Failure: could not insert fault injection, segment not in primary or mirror role Failure: could not insert fault injection, segment not in primary or mirror role ``` This patch removes the check for the role for non-filerep faults so that they don't fail on a cluster initialized without filerep.
-
由 Asim R P 提交于
Filerep resync logic to fetch changed blocks from changetracking (CT) log is changed. LSN is no longer used to filter out blocks from CT log. If a relation's changed blocks falls above the threshold number of blocks that can be fetched at a time, the last fetched block number is remembered and used to form subsequent batch.
-
由 Asim R P 提交于
Filerep resync works by obtaining blocks changed since a mirror went down from changetracking (CT) log. The changed blocks are obtained in fixed sized batches. Blocks of the same relation are ordered by block number. The bug occurs when a higher numbered block of a relation is changed such that it has lower LSN as compared to lower numbered blocks. And the higher numbered blocks is not included in the first batch of changed blocks for this relation. Such blocks miss being resynchronized to mirror due to incorret filter based on previously obtained changed blocks' LSN. That means the mirror is eventually declared in-sync with primary but some changed blocks remain only on the primary. This loss in data manifests only when the mirror takes over as primary, upon rebalance or the primary going down.
-
由 Asim R P 提交于
The GUC gp_changetracking_max_rows replaces a compile time constant. Resync worker obtains at the most gp_changetracking_max_rows number of changed blocks from changetracking log at one time. Controling this with a GUC allows exploiting bugs in resync logic around this area.
-
由 mkiyama 提交于
-
由 mkiyama 提交于
-
由 mkiyama 提交于
-
由 mkiyama 提交于
-
由 mkiyama 提交于
-
- 12 7月, 2017 3 次提交
-
-
由 Adam Lee 提交于
-
由 Adam Lee 提交于
0.9.8 is EOL, 1.0+ version has many security and performance improvements.
-
由 Jesse Zhang 提交于
`enable-cassert` is your friend, yo
-
- 11 7月, 2017 16 次提交
-
-
由 Heikki Linnakangas 提交于
If you have a query like "SELECT COUNT(col1) FROM wide_table", where the table has dozens of columns, the overhead in aocs_getnext() just to figure out which columns need to be fetched becomes noticeable. Optimize it.
-
由 Heikki Linnakangas 提交于
There was a mixture of spaces and tabs being used for indentation in aocsam.c, and I finally got fed up with that while doing other changes in that file. I ran pgindent, and did a bunch of manual fixups of the formatting. All the changes in this commit are purely cosmetic. I did the same for appendonlyam.c, although I'm not changing it at the moment, to keep aocsam.c and appendonlyam.c in sync.
-
由 Heikki Linnakangas 提交于
In aocsam.c, there's a block of code that does: if (...) { AOTupleIdInit_rowNum(...); } else { AOTupleIdInit_rowNum(...); } While hacking, I removed the seemingly unnecessary braces, turning that into just: if (...) AOTupleIdInit_rowNum(...); else AOTupleIdInit_rowNum(...); But then I got a compiler error, about 'else' without 'if'. I was baffled for a moment, until I looked at the definition of AOTupleIdInit_rowNum. The way it includes curly braces makes it not work in an if-else construct like above. These macros also have double-evaluation hazards. To make this more robust, turn the macros into static inline functions. Inline functions generally behave more sanely and are more readable than macros.
-
由 Heikki Linnakangas 提交于
This does mean that we don't free the array quite as quickly as we used to, but it's a drop in the sea. The array is very small, there are much bigger data structures involved in evey AOCS scan that are not freed as quickly, and it's freed at the end of the query in any case.
-
由 Heikki Linnakangas 提交于
Commit fa6c2d43 added two functions, but forgot to add prototypes for them.
-
由 Adam Lee 提交于
Which is important for debugging customers' issues. (log level still matters)
-
由 Ming LI 提交于
1. Log raw string if it can't be decoded as unicode. 2. If similar exception issues in log(), continue processing left log with a warning. 3. If other exception issues in CatThread, log thread exit without blocking worker process, and report warning "gpfdist log halt because Log Thread got an exception:".
-
由 Marbin Tan 提交于
Create a more extensive workload for the sql to make it last longer. The previous sql was completing too fast and so when the actual pid read happens, there pid no longer exists and causes the result to be 0.
-
由 Venkatesh Raghavan 提交于
-
Oops we broke the tests sorry :( This reverts commit 97db5bdd.
-
由 Kavinder Dhaliwal 提交于
Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
-
由 Chuck Litzell 提交于
* Pivotal GSS name change to Pivotal Support * Change Greenplum Customer Support reference to a Warning, as in the user doc
-
由 John Gaskin 提交于
Signed-off-by: NShivram Mani <shivram.mani@gmail.com>
-
由 Nadeem Ghani 提交于
Workaround a problem discovered by a client that noticed intermittent errors for gpssh when some nodes became very cpu-bound. In particular, we override the way the ssh command prompt is validated on a remote machine, within gpssh. The vendored module 'pexpect' tries to match 2 successive prompts from an interactive bash shell. However, if the target host is slow from CPU loading or network loading, these prompts may return late. In that case, the override retries several times, extending the timeout from the default 1 second to up to 125 times that duration. Experimentally, these added retries seem to tolerate about 1 second delay, testing with a 'tc' command that slows network traffic artificially. The number of retries can be configured. --add unit tests to verify happy path of ssh-ing to localhost --add module for gpssh, for overriding pexpect (pxxssh) --add readme to describe testing technique using 'tc' to delay network Signed-off-by: NLarry Hamel <lhamel@pivotal.io>
-
由 Larry Hamel 提交于
Also, added a unit test. Signed-off-by: NNadeem Ghani <nghani@pivotal.io>
-
由 Nadeem Ghani 提交于
Signed-off-by: NLarry Hamel <lhamel@pivotal.io>
-
- 10 7月, 2017 2 次提交
-
-
由 xiong-gang 提交于
CREATE RESOURCE GROUP rg1 WITH (concurrency=1, cpu_rate_limit=10, memory_limit=10); CREATE ROLE r1 RESOURCE GROUP rg1; session 1: set role r1; BEGIN; session 2: BEGIN; <--- hang, and then cancel BEGIN; <--- assertion failure Signed-off-by: NNing Yu <nyu@pivotal.io>
-
由 Richard Guo 提交于
Memory usage statistic in resource group is defined as unsigned integer. For subtraction 'a - b' on memory usage, the atomic subtraction function 'pg_atomic_sub_fetch_*' will return the value of 'a' before the subtraction. Then this value is asserted to be no less than 'b'.
-
- 07 7月, 2017 3 次提交
-
-
由 Adam Lee 提交于
-
由 Adam Lee 提交于
-
由 Ning Yu 提交于
Change initial contents in pg_resgroupcapability: * Remove memory_redzone_limit; * Add memory_shared_quota, memory_spill_ratio; Change resgroup concurrency range to [1, 'max_connections']: * Original range is [0, 'max_connections'], and -1 means unlimited. * Now the range is [1, 'max_connections'], and -1 is not supported. Change resgroup limit type from float to int. Changed below resgroup resource limit types from float to int percentage value: * cpu_rate_limit; * memory_limit; * memory_shared_quota; * memory_spill_ratio;
-