- 25 5月, 2011 2 次提交
-
-
由 Ying Han 提交于
Change each shrinker's API by consolidating the existing parameters into shrink_control struct. This will simplify any further features added w/o touching each file of shrinker. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: fix warning] [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API] [akpm@linux-foundation.org: fix xfs warning] [akpm@linux-foundation.org: update gfs2] Signed-off-by: NYing Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: NPavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: NRik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Bob Peterson 提交于
This patch fixes a race in the GFS2 glock state machine that may result in lockups. The symptom is that all nodes but one will hang, waiting for a particular glock. All the holder records will have the "W" (Waiting) bit set. The other node will typically have the glock stuck in Exclusive mode (EX) with no holder records, but the dinode will be cached. In other words, an entry with "I:" will appear in the glock dump for that glock, but nothing else. The race has to do with the glock "Pending Demote" bit, which can be set, then immediately reset, thus losing the fact that another node needs the glock. The sequence of events is: 1. Something schedules the glock workqueue (e.g. glock request from fs) 2. The glock workqueue gets to the point between the test of the reply pending bit and the spin lock: if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) { finish_xmote(gl, gl->gl_reply); drop_ref = 1; } down_read(&gfs2_umount_flush_sem); <---- i.e. here spin_lock(&gl->gl_spin); 3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and (b) the demote request which sets GLF_PENDING_DEMOTE 4. The following test is executed: if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) && gl->gl_state != LM_ST_UNLOCKED && gl->gl_demote_state != LM_ST_EXCLUSIVE) { This resets the pending demote flag, and gl->gl_demote_state is not equal to exclusive, however because the reply from the dlm arrived after we checked for the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG. The patch closes the timing window by only transitioning the "Pending demote" bit to the "demote" flag once we know the other conditions (not unlocked and not exclusive) are met. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 05 5月, 2011 1 次提交
-
-
由 Steven Whitehouse 提交于
Previously we marked all locks being promoted to a higher mode with the try flag to avoid any potential deadlocks issues. The DLM is able to detect these and report them in way that GFS2 can deal with them correctly. So we can just request the required mode and wait for a response without needing to perform this check. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 26 4月, 2011 1 次提交
-
-
由 Christoph Hellwig 提交于
Now that the whole dcache_hash_bucket crap is gone, go all the way and also remove the weird locking layering violations for locking the hash buckets. Add hlist_bl_lock/unlock helpers to move the locking into the list abstraction instead of requiring each caller to open code it. After all allowing for the bit locks is the whole point of these helpers over the plain hlist variant. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 4月, 2011 4 次提交
-
-
由 Steven Whitehouse 提交于
This patch adds writeback_control to writing back the AIL list. This means that we can then take advantage of the information we get in ->write_inode() in order to set off some pre-emptive writeback. In addition, the AIL code is cleaned up a bit to make it a bit simpler to understand. There is still more which can usefully be done in this area, but this is a good start at least. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
The GLF_LRU flag introduced in the previous patch can be used to check if a glock is on the lru list when a new holder is queued and if so remove it, without having first to get the lru_lock. The main purpose of this patch however is to optimise the glocks left over when an inode at end of life is being evicted. Previously such glocks were left with the GLF_LFLUSH flag set, so that when reclaimed, each one required a log flush. This patch resets the GLF_LFLUSH flag when there is nothing left to flush thus preventing later log flushes as glocks are reused or demoted. In order to do this, we need to keep track of the number of revokes which are outstanding, and also to clear the GLF_LFLUSH bit after a log commit when only revokes have been processed. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This adds support for two new flags. One keeps track of whether the glock is on the LRU list or not. The other isn't really a flag as such, but an indication of whether the glock has an attached object or not. This indication is reported without any locking, which is ok since we do not dereference the object pointer but merely report whether it is NULL or not. Also, this fixes one place where a tracepoint was missing, which was at the point we remove deallocated blocks from the journal. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
Rather than allowing the glocks to be scheduled for possible reclaim as soon as they have exited the journal, this patch delays their entry to the list until the glocks in question are no longer in use. This means that we will rely on the vm for writeback of all dirty data and metadata from now on. When glocks are added to the lru list they should be freeable much faster since all the I/O required to free them should have already been completed. This should lead to much better I/O patterns under low memory conditions. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 31 3月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
-
- 15 3月, 2011 1 次提交
-
-
由 Steven Whitehouse 提交于
As per RCU glock patch review comments, don't use the _raw version of this function here. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 11 3月, 2011 1 次提交
-
-
由 Bob Peterson 提交于
This is a small patch that optimizes multiple glock dequeue operations. It changes the unlock order to be more efficient and makes it easier for lock debugging tools to unravel. It also eliminates the need for the temp variable x, although that would likely be optimized out. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 09 3月, 2011 1 次提交
-
-
由 Steven Whitehouse 提交于
This patch fixes a race in deallocating glocks which was introduced in the RCU glock patch. We need to ensure that the glock count is kept correct even in the case that there is a race to add a new glock into the hash table. Also, to avoid having to wait for an RCU grace period, the glock counter can be decremented before call_rcu() is called. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 17 2月, 2011 1 次提交
-
-
由 Tejun Heo 提交于
There are two spellings in use for 'freeze' + 'able' - 'freezable' and 'freezeable'. The former is the more prominent one. The latter is mostly used by workqueue and in a few other odd places. Unify the spelling to 'freezable'. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NAlan Stern <stern@rowland.harvard.edu> Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: NGreg Kroah-Hartman <gregkh@suse.de> Acked-by: NDmitry Torokhov <dtor@mail.ru> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alex Dubov <oakad@yahoo.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Steven Whitehouse <swhiteho@redhat.com>
-
- 31 1月, 2011 1 次提交
-
-
由 Steven Whitehouse 提交于
Somehow this tracepoint landed up in the wrong place. This moves it to where it should be. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 21 1月, 2011 1 次提交
-
-
由 Steven Whitehouse 提交于
This has a number of advantages: - Reduces contention on the hash table lock - Makes the code smaller and simpler - Should speed up glock dumps when under load - Removes ref count changing in examine_bucket - No longer need hash chain lock in glock_put() in common case There are some further changes which this enables and which we may do in the future. One is to look at using SLAB_RCU, and another is to look at using a per-cpu counter for the per-sb glock counter, since that is touched twice in the lifetime of each glock (but only used at umount time). Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 30 11月, 2010 5 次提交
-
-
由 Steven Whitehouse 提交于
We can only merge the fields into a bitfield if the locking rules for them are the same. In this case gl_spin covers all of the fields (write side) but a couple of them are used with GLF_LOCK as the read side lock, which should be ok since we know that the field in question won't be changing at the time. The gl_req setting has to be done earlier (in glock.c) in order to place it under gl_spin. The gl_reply setting also has to be brought under gl_spin in order to comply with the new rules. This saves 4*sizeof(unsigned int) per glock. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
-
由 Steven Whitehouse 提交于
The DLM never returns -EAGAIN in response to dlm_lock(), and even if it did, the test in gdlm_lock() was wrong anyway. Once that test is removed, it is possible to greatly simplify this code by simply using a "normal" error return code (0 for success). We then no longer need the LM_OUT_ASYNC return code which can be removed. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Joe Perches 提交于
Using %pV reduces the number of printk calls and eliminates any possible message interleaving from other printk calls. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Joe Perches 提交于
Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
The WQ_RESCUER flag should only be used internally to the workqueue implementation. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Acked-by: NTejun Heo <tj@kernel.org>
-
- 15 11月, 2010 1 次提交
-
-
由 Steven Whitehouse 提交于
This area of the code has always been a bit delicate due to the subtleties of lock ordering. The problem is that for "normal" alloc/dealloc, we always grab the inode locks first and the rgrp lock later. In order to ensure no races in looking up the unlinked, but still allocated inodes, we need to hold the rgrp lock when we do the lookup, which means that we can't take the inode glock. The solution is to borrow the technique already used by NFS to solve what is essentially the same problem (given an inode number, look up the inode carefully, checking that it really is in the expected state). We cannot do that directly from the allocation code (lock ordering again) so we give the job to the pre-existing delete workqueue and carry on with the allocation as normal. If we find there is no space, we do a journal flush (required anyway if space from a deallocation is to be released) which should block against the pending deallocations, so we should always get the space back. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 29 9月, 2010 1 次提交
-
-
由 Steven Whitehouse 提交于
The tests further down the recovery function relating to unlocking the journal need to be updated to match the intial test. Also, a test in the umount code which was surplus to requirements has been removed. Umounting spectator mounts now works correctly, as expected. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 20 9月, 2010 2 次提交
-
-
由 Steven Whitehouse 提交于
The recovery workqueue can be freezable since we want it to finish what it is doing if the system is to be frozen (although why you'd want to freeze a cluster node is beyond me since it will result in it being ejected from the cluster). It does still make sense for single node GFS2 filesystems though. The glock workqueue will benefit from being able to run more work items concurrently. A test running postmark shows improved performance and multi-threaded workloads are likely to benefit even more. It needs to be high priority because the latency directly affects the latency of filesystem glock operations. The delete workqueue is similar to the recovery workqueue in that it must not get blocked by memory allocations, and may run for a long time. Potentially other GFS2 threads might also be converted to workqueues, but I'll leave that for a later patch. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Acked-by: NTejun Heo <tj@kernel.org>
-
由 Steven Whitehouse 提交于
Due to the design of the VFS, it is quite usual for operations on GFS2 to consist of a lookup (requiring a shared lock) followed by an operation requiring an exclusive lock. If a remote node has cached an exclusive lock, then it will receive two demote events in rapid succession firstly for a shared lock and then to unlocked. The existing min hold time code was triggering in this case, even if the node was otherwise idle since the state change time was being updated by the initial demote. This patch introduces logic to skip the min hold timer in the case that a "double demote" of this kind has occurred. The min hold timer will still be used in all other cases. A new glock flag is introduced which is used to keep track of whether there have been any newly queued holders since the last glock state change. The min hold time is only applied if the flag is set. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Tested-by: NAbhijith Das <adas@redhat.com>
-
- 02 8月, 2010 1 次提交
-
-
由 Steven Whitehouse 提交于
This is a clean up of the code which deals with LM_FLAG_NOEXP which aims to remove any possible race conditions by using gl_spin to cover the gap between testing for the LM_FLAG_NOEXP and the GL_FROZEN flag. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 29 7月, 2010 2 次提交
-
-
由 Steven Whitehouse 提交于
This reverts commit b7dc2df5. The initial patch didn't quite work since it doesn't cover all the possible routes by which the GLF_FROZEN flag might be set. A revised fix is coming up in the next patch. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This looks like a big change, but in reality its only a single line of actual code change, the rest is just moving a function to before its new caller. The "try" flag for glocks is a rather subtle and delicate setting since it requires that the state machine tries just hard enough to ensure that it has a good chance of getting the requested lock, but no so hard that the request can land up blocked behind another. The patch adds in an additional check which will fail any queued try locks if there is another request blocking the try lock request which is not granted and compatible, nor in progress already. The check is made only after all pending locks which may be granted have been granted. I've checked this with the reproducer for the reported flock bug which this is intended to fix, and it now passes. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 19 7月, 2010 1 次提交
-
-
由 Dave Chinner 提交于
The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 15 7月, 2010 1 次提交
-
-
由 Bob Peterson 提交于
This patch fixes bugzilla bug #590878: GFS2: recovery stuck on transaction lock. We set the frozen flag on the glock when we receive a completion that cannot be delivered due to blocked locks. At that point we check to see whether the first waiting holder has the noexp flag set. If the noexp lock is queued later, then we need to unfreeze the glock at that point in time, namely, in the glock work function. This patch was originally written by Steve Whitehouse, but since he's on holiday, I'm submitting it. It's been well tested with a complex recovery test called revolver. Signed-off-by: NSteve Whitehouse <swhiteho@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 14 4月, 2010 1 次提交
-
-
由 Bob Peterson 提交于
This patch fixes a couple gfs2 problems with the reclaiming of unlinked dinodes. First, there were a couple of livelocks where everything would come to a halt waiting for a glock that was seemingly held by a process that no longer existed. In fact, the process did exist, it just had the wrong pid number in the holder information. Second, there was a lock ordering problem between inode locking and glock locking. Third, glock/inode contention could sometimes cause inodes to be improperly marked invalid by iget_failed. Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 01 3月, 2010 3 次提交
-
-
由 Bob Peterson 提交于
This patch changes glock numbers from printing in decimal to hex. Since DLM prints corresponding resource IDs in hex, it makes debugging easier. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
As a consequence of the previous patch, we can now remove the loop which used to be required due to the circular dependency between the inodes and glocks. Instead we can just invalidate the inodes, and then clear up any glocks which are left. Also we no longer need the rwsem since there is no longer any danger of the inode invalidation calling back into the glock code (and from there back into the inode code). Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
Since the start of GFS2, an "extra" inode has been used to store the metadata belonging to each inode. The only reason for using this inode was to have an extra address space, the other fields were unused. This means that the memory usage was rather inefficient. The reason for keeping each inode's metadata in a separate address space is that when glocks are requested on remote nodes, we need to be able to efficiently locate the data and metadata which relating to that glock (inode) in order to sync or sync and invalidate it (depending on the remotely requested lock mode). This patch adds a new type of glock, which has in addition to its normal fields, has an address space. This applies to all inode and rgrp glocks (but to no other glock types which remain as before). As a result, we no longer need to have the second inode. This results in three major improvements: 1. A saving of approx 25% of memory used in caching inodes 2. A removal of the circular dependency between inodes and glocks 3. No confusion between "normal" and "metadata" inodes in super.c Although the first of these is the more immediately apparent, the second is just as important as it now enables a number of clean ups at umount time. Those will be the subject of future patches. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 03 2月, 2010 1 次提交
-
-
由 Steven Whitehouse 提交于
Although all glocks are, by the time of the umount glock wait, scheduled for demotion, some of them haven't made it far enough through the process for the original set of waiting code to wait for them. This extends the ref count to the whole glock lifetime in order to ensure that the waiting does catch all glocks. It does make it a bit more invasive, but it seems the only sensible solution at the moment. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 03 12月, 2009 2 次提交
-
-
由 Steven Whitehouse 提交于
This patch fixes some ref counting issues. Firstly by moving the point at which we drop the ref count after a dlm lock operation has completed we ensure that we never call gfs2_glock_hold() on a lock with a zero ref count. Secondly, by using atomic_dec_and_lock() in gfs2_glock_put() we ensure that at no time will a glock with zero ref count appear on the lru_list. That means that we can remove the check for this in our shrinker (which was racy). Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
We need to be careful of the ordering between clearing the GLF_LOCK bit and scheduling the workqueue. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 30 7月, 2009 4 次提交
-
-
由 Benjamin Marzinski 提交于
When a file is deleted from a gfs2 filesystem on one node, a dcache entry for it may still exist on other nodes in the cluster. If this happens, gfs2 will be unable to free this file on disk. Because of this, it's possible to have a gfs2 filesystem with no files on it and no free space. With this patch, when a node receives a callback notifying it that the file is being deleted on another node, it schedules a new workqueue thread to remove the file's dcache entry. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Benjamin Marzinski 提交于
GFS2 was placing far too many glocks on the reclaim list that were not good candidates for freeing up from cache. These locks would sit there and repeatedly get scanned to see if they could be reclaimed, wasting a lot of time when there was memory pressure. This fix does more checks on the locks to see if they are actually likely to be removable from cache. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Benjamin Marzinski 提交于
It is possible for gfs2_shrink_glock_memory() to check a glock for demotion that's in the process of being freed by gfs2_glock_put(). In this case, gfs2_shrink_glock_memory() will acquire a new reference to this glock, and then try to free the glock itself when it drops the refernce. To solve this, gfs2_shrink_glock_memory() just needs to check if the glock is in the process of being freed, and if so skip it without ever unlocking the lru_lock. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Acked-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This patch removes some of the special cases that the shrinker was trying to deal with. As a result we leave fewer items on the list and none at all which cannot be demoted. This makes the list scanning more efficient and solves some issues seen with large numbers of inodes. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-