- 13 1月, 2018 3 次提交
-
-
由 Heikki Linnakangas 提交于
This hopefully fixes the gp_replica_check failures we're seeing in the pipeline.
-
由 Heikki Linnakangas 提交于
* Revert almost all the changes in smgr.c / md.c, to not go through the Mirrored* APIs. * Remove mmxlog stuff. Use upstream "pending relation deletion" code instead. * Get rid of multiple startup passes. Now it's just a single pass like in the upstream. * Revert the way database drop/create are handled to the way it is in upstream. Doesn't use PT anymore, but accesses file system directly, and WAL-logs a single CREATE/DROP DATABASE WAL record. * Get rid of MirroredLock * Remove a few tests that were specific to persistent tables. * Plus a lot of little removals and reverts to upstream code.
-
由 Ashwin Agrawal 提交于
This was little painful one to entangle, but seems done now. Though if any shake-up happens shoudl be primary suspect.
-
- 21 11月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
For clarity, and to make merging easier. The code to manage the hash table of "pending resync EOFs" for append-only tables is moved to smgr_ao.c. One notable change here is that the pendingDeletesPerformed flag is removed. It was used to track whether there are any pending deletes, or any pending AO table resyncs, but we might as well check the pending delete list and the pending syncs hash table directly, it's hardly any slower than checking a separate boolean. There are still plenty of GPDB changes in smgr.c, but this is a good step forward.
-
- 24 6月, 2017 1 次提交
-
-
由 Ashwin Agrawal 提交于
-
- 07 3月, 2017 3 次提交
-
-
由 Ashwin Agrawal 提交于
Rename checkpoint.c to checkpointer.c. And move the code from bgwriter.c to checkpointer.c and also renames most of corresponding data structures to refect the clear ownership and association. This commit brings it as close as possible to PostgreSQL 9.2. Reference to PostgreSQL related commits: commit 806a2aee Split work of bgwriter between 2 processes: bgwriter and checkpointer. commit bf405ba8 Add new file for checkpointer.c commit 8f28789b Rename BgWriterShmem/Request to CheckpointerShmem/Request commit d843589e5ab361dd4738dab5c9016e704faf4153 Fix management of pendingOpsTable in auxiliary processes.
-
We had partially pulled the fix to separate checkpoint and bgwriter processes and introduced a bug where pendingOpsTable was maintained in both the processes. The pendingOpsTable records pending fsync requests. Only checkpoint process should keep it. Bgwriter should only write out dirty pages to OS cache. Apparently, upstream also had this same bug and it was fixed in d843589e5ab361dd4738dab5c9016e704faf4153 Also ensure that background writer sweeps buffers even in the first run after checkpoint. There is no reason to hold off until next run and this is how it works in upstream. Fixes issue discussed on mailing list: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/PHKuQPNwWs0
-
The commit includes a UDF to walk dirty shared buffers and a new fault `fault_counter` to count the number of files fsync'ed by checkpointer process. Also another new fault `bg_buffer_sync_default_logic` to flush all buffers for BgBufferSync() for the background writer process.
-
- 03 3月, 2017 1 次提交
-
-
In PostgreSQL, the unlink is deferred and handled by checkpoint. In GPDB, the unlink is always handled by persistent tables and hence it's protected of duplicated relfilenode deletion during recovery. Functionally, there is no harm to send unlink to checkpoint process, and checkpoint process cannot even find the relfilenode to be deleted. However, this will be a performance impact under scenario like deleting a table with large number of partitions, where the fsync request queue is unnecessarily filled. The detailed discussions are at: https://groups.google.com/a/greenplum.org/forum/#!searchin/gpdb-dev/mdunlink%7Csort:relevance/gpdb-dev/PHKuQPNwWs0/1kIwDk-CEgAJ
-
- 20 12月, 2016 1 次提交
-
-
由 Heikki Linnakangas 提交于
This commit substantially rewrites pg_upgrade to handle upgrading a Greenplum cluster from 4.3 to 5.0. The Greenplum specifics of pg_upgrade are documented in contrib/pg_upgrade/README.gpdb. A summary of the changes is listed below: - Make pg_upgrade to pass the pre-checks against GPDB 4.3. - Restore dumped schema in utility mode: pg_upgrade is executed on a single server in offline mode so ensure we are using utility mode. - Disable pg_upgrade checks that don't apply when upgrading to 8.3: When support for upgrading to Greenplum 6.0 is added the checks that make sense to backport will need to be readded. - Support AO/AOCS table: This bumps the AO table version number, and adds a conversion routine for numeric attributes. The on-disk format of numerics changed between PostgreSQL 8.3 and 8.4. With this commit, we can distinguish between AO segments created in the old format and the new, and read both formats. New AO segments are always created in the new format. Also performs a check for AO tables having NUMERIC attributes without free segfiles. Since AO table segments cannot be rewritten if there are no free segfiles, issue a warning if such a table is encountered during the upgrade. - Add code to convert heap pages offline: Bumps heap page format version number. While this isn't strictly necessary, when we're doing the conversion off-line, it reduces confusion if something goes wrong. - Add check for money datatype: the upgrade doesn't support the money datatype so check for it's presence and abort upgrade if found. - Create new Oid in QD and pass new Oids in dump for pg_upgrade on QE: When upgrading from GPDB4 to 5, we need to create new arraytypes for the base relation rowtypes in the QD, but we also need to dispatch these new OIDs to the QEs. Objects assigning InvalidOid in the Oid dispatcher will cause a new Oid to be assigned. Once the new cluster is restored, dump the new Oids into a separate dumpfile which isn't unlinked on exit. If this file is placed into the cwd of pg_upgrade on the QEs, it will be pulled into the db dump and used during restoring, thus "dispatching" the Oids from the QD even though they are offline. pg_upgrade doesn't at this point know if it's running at a QD or a QE so it will always dump this file and include the InvalidOid markers. - gp_relation_node is reset and rebuilt during upgrade once the data files from the old cluster are available to the new cluster. This change required altering how checkpoints are requested in the backend. - Mark indexes as invalid to ensure they are rebuilt in the new cluster. - Copy the pg_distributedlog from old to new during upgrade: We need the distributedlog in the new cluster to be able to start up once the upgrade has pulled over the clog. - Dont delete dumps when runnin with --debug: While not specific to Greenplum, this is a local addition which greatly helps testing and development of pg_upgrade. For testing purposes, a small test cluster created with Greenplum 4.3 is included in contrib/pg_upgrade/test Heikki Linnakangas, Daniel Gustafsson and Dave Cramer
-
- 10 5月, 2016 1 次提交
-
-
由 Daniel Gustafsson 提交于
MirroredBufferedPool_Truncate() returns a boolean so testing for (!returnvalue < 0) will always return false and not what we expect it to test. Github user fengttt.
-
- 26 11月, 2015 1 次提交
-
-
由 Heikki Linnakangas 提交于
Move a comment outside local block, so that the indentation matches upstream. Remove some unnecessary #includes.
-
- 28 10月, 2015 1 次提交
-
-
- 26 6月, 2012 1 次提交
-
-
由 Robert Haas 提交于
This backports commit 7f242d88, except for the counter in pg_stat_bgwriter. The underlying problem (namely, that a full fsync request queue causes terrible checkpoint behavior) continues to be reported in the wild, and this code seems to be safe and robust enough to risk back-porting the fix.
-
- 27 6月, 2009 1 次提交
-
-
由 Tom Lane 提交于
archive recovery. Invent a separate state variable and inquiry function for XLogInsertAllowed() to clarify some tests and make the management of writing the end-of-recovery checkpoint less klugy. Fix several places that were incorrectly testing InRecovery when they should be looking at RecoveryInProgress or XLogInsertAllowed (because they will now be executed in the bgwriter not startup process). Clarify handling of bad LSNs passed to XLogFlush during recovery. Use a spinlock for setting/testing SharedRecoveryInProgress. Improve quite a lot of comments. Heikki and Tom
-
- 26 6月, 2009 1 次提交
-
-
由 Heikki Linnakangas 提交于
during it: When bgwriter is active, the startup process can't perform mdsync() correctly because it won't see the fsync requests accumulated in bgwriter's private pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery checkpoint as well, when it's active. When bgwriter is active (= archive recovery), the startup process must not accumulate fsync requests to its own pendingOpsTable, since bgwriter won't see them there when it performs restartpoints. Make startup process drop its pendingOpsTable when bgwriter is launched to avoid that. Update minimum recovery point one last time when leaving archive recovery. It won't be updated by the end-of-recovery checkpoint because XLogFlush() sees us as out of recovery already. This fixes bug #4879 reported by Fujii Masao.
-
- 11 6月, 2009 1 次提交
-
-
由 Bruce Momjian 提交于
provided by Andrew.
-
- 12 3月, 2009 1 次提交
-
-
由 Tom Lane 提交于
some bufmgr probes, take out redundant and memory-leak-inducing path arguments to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to recalculate space used in sort__done, clean up formatting in places where I'm not sure pgindent will do a nice job by itself.
-
- 12 1月, 2009 1 次提交
-
-
由 Tom Lane 提交于
GUC variable effective_io_concurrency controls how many concurrent block prefetch requests will be issued. (The best way to handle this for plain index scans is still under debate, so that part is not applied yet --- tgl) Greg Stark
-
- 02 1月, 2009 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 17 12月, 2008 1 次提交
-
-
由 Bruce Momjian 提交于
includes a few new ones. - Fixed compilation errors on OS X for probes that use typedefs - Fixed a number of probes to pass ForkNumber per the relation forks patch - The new probes are those that were taken out from the previous submitted patch and required simple fixes. Will submit the other probes that may require more discussion in a separate patch. Robert Lor
-
- 14 11月, 2008 1 次提交
-
-
由 Heikki Linnakangas 提交于
before passing it to elog.
-
- 11 11月, 2008 1 次提交
-
-
由 Heikki Linnakangas 提交于
"base/11517/3767_fsm", instead of symbolic names like "1663/11517/3767/1", per Alvaro's suggestion. I didn't change the messages in the higher-level index, heap and FSM routines, though, where the fork is implicit.
-
- 11 8月, 2008 1 次提交
-
-
由 Heikki Linnakangas 提交于
of multiple forks, and each fork can be created and grown separately. The bulk of this patch is about changing the smgr API to include an extra ForkNumber argument in every smgr function. Also, smgrscheduleunlink and smgrdounlink no longer implicitly call smgrclose, because other forks might still exist after unlinking one. The callers of those functions have been modified to call smgrclose instead. This patch in itself doesn't have any user-visible effect, but provides the infrastructure needed for upcoming patches. The additional forks envisioned are a rewritten FSM implementation that doesn't rely on a fixed-size shared memory block, and a visibility map to allow skipping portions of a table in VACUUM that have no dead tuples.
-
- 02 5月, 2008 1 次提交
-
-
由 Tom Lane 提交于
support for a nonsegmented mode from md.c. Per recent discussions, there doesn't seem to be much value in a "never segment" option as opposed to segmenting with a suitably large segment size. So instead provide a configure-time switch to set the desired segment size in units of gigabytes. While at it, expose a configure switch for BLCKSZ as well. Zdenek Kotala
-
- 18 4月, 2008 2 次提交
-
-
由 Heikki Linnakangas 提交于
place to prevent reusing relation OIDs before next checkpoint, and DROP DATABASE. First, if a database was dropped, bgwriter would still try to unlink the files that the rmtree() call by the DROP DATABASE command has already deleted, or is just about to delete. Second, if a database is dropped, and another database is created with the same OID, bgwriter would in the worst case delete a relation in the new database that happened to get the same OID as a dropped relation in the old database. To fix these race conditions: - make rmtree() ignore ENOENT errors. This fixes the 1st race condition. - make ForgetDatabaseFsyncRequests forget unlink requests as well. - force checkpoint on in dropdb on all platforms Since ForgetDatabaseFsyncRequests() is asynchronous, the 2nd change isn't enough on its own to fix the problem of dropping and creating a database with same OID, but forcing a checkpoint on DROP DATABASE makes it sufficient. Per Tom Lane's bug report and proposal. Backpatch to 8.3.
-
由 Heikki Linnakangas 提交于
place to prevent reusing relation OIDs before next checkpoint, and DROP DATABASE. First, if a database was dropped, bgwriter would still try to unlink the files that the rmtree() call by the DROP DATABASE command has already deleted, or is just about to delete. Second, if a database is dropped, and another database is created with the same OID, bgwriter would in the worst case delete a relation in the new database that happened to get the same OID as a dropped relation in the old database. To fix these race conditions: - make rmtree() ignore ENOENT errors. This fixes the 1st race condition. - make ForgetDatabaseFsyncRequests forget unlink requests as well. - force checkpoint on in dropdb on all platforms Since ForgetDatabaseFsyncRequests() is asynchronous, the 2nd change isn't enough on its own to fix the problem of dropping and creating a database with same OID, but forcing a checkpoint on DROP DATABASE makes it sufficient. Per Tom Lane's bug report and proposal. Backpatch to 8.3.
-
- 11 3月, 2008 1 次提交
-
-
由 Tom Lane 提交于
than dividing them into 1GB segments as has been our longtime practice. This requires working support for large files in the operating system; at least for the time being, it won't be the default. Zdenek Kotala
-
- 02 1月, 2008 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 16 11月, 2007 5 次提交
-
-
由 Tom Lane 提交于
-
由 Bruce Momjian 提交于
avoid this problem in the future.)
-
由 Tom Lane 提交于
support the latter.
-
由 Bruce Momjian 提交于
-
由 Tom Lane 提交于
checkpoint. This guards against an unlikely data-loss scenario in which we re-use the relfilenode, then crash, then replay the deletion and recreation of the file. Even then we'd be OK if all insertions into the new relation had been WAL-logged ... but that's not guaranteed given all the no-WAL-logging optimizations that have recently been added. Patch by Heikki Linnakangas, per a discussion last month.
-
- 03 7月, 2007 1 次提交
-
-
由 Tom Lane 提交于
checkpoint. The comment claimed that we could do this anytime after setting the checkpoint REDO point, but actually BufferSync is relying on the assumption that buffers dumped by other backends will be fsync'd too. So we really could not do it any sooner than we are doing it.
-
- 13 4月, 2007 1 次提交
-
-
由 Tom Lane 提交于
fast flow of new fsync requests can prevent mdsync() from ever completing. This was an unforeseen consequence of a patch added in Mar 2006 to prevent the fsync request queue from overflowing. Problem identified by Heikki Linnakangas and independently by ITAGAKI Takahiro; fix based on ideas from Takahiro-san, Heikki, and Tom. Back-patch as far as 8.1 because a previous back-patch introduced the problem into 8.1 ...
-
- 18 1月, 2007 1 次提交
-
-
由 Tom Lane 提交于
pending fsyncs during DROP DATABASE. Obviously necessary in hindsight :-(
-
- 17 1月, 2007 1 次提交
-
-
由 Tom Lane 提交于
is deleted. A backend about to unlink a file now sends a "revoke fsync" request to the bgwriter to make it clean out pending fsync requests. There is still a race condition where the bgwriter may try to fsync after the unlink has happened, but we can resolve that by rechecking the fsync request queue to see if a revoke request arrived meanwhile. This eliminates the former kluge of "just assuming" that an ENOENT failure is okay, and lets us handle the fact that on Windows it might be EACCES too without introducing any questionable assumptions. After an idea of mine improved by Magnus. The HEAD patch doesn't apply cleanly to 8.2, but I'll see about a back-port later. In the meantime this could do with some testing on Windows; I've been able to force it through the code path via ENOENT, but that doesn't prove that it actually fixes the Windows problem ...
-
- 06 1月, 2007 1 次提交
-
-
由 Bruce Momjian 提交于
back-stamped for this.
-
- 04 1月, 2007 1 次提交
-
-
由 Tom Lane 提交于
having md.c return a success/failure boolean to smgr.c, which was just going to elog anyway, let md.c issue the elog messages itself. This allows better error reporting, particularly in cases such as "short read" or "short write" which Peter was complaining of. Also, remove the kluge of allowing mdread() to return zeroes from a read-beyond-EOF: this is now an error condition except when InRecovery or zero_damaged_pages = true. (Hash indexes used to require that behavior, but no more.) Also, enforce that mdwrite() is to be used for rewriting existing blocks while mdextend() is to be used for extending the relation EOF. This restriction lets us get rid of the old ad-hoc defense against creating huge files by an accidental reference to a bogus block number: we'll only create new segments in mdextend() not mdwrite() or mdread(). (Again, when InRecovery we allow it anyway, since we need to allow updates of blocks that were later truncated away.) Also, clean up the original makeshift patch for bug #2737: move the responsibility for padding relation segments to full length into md.c.
-