- 03 1月, 2014 7 次提交
-
-
由 Steven Whitehouse 提交于
Each rgrp header is represented as a single extent on disk, so we can calculate the position within the address space, since we are using address spaces mapped 1:1 to the disk. This means that it is possible to use the range based versions of filemap_fdatawrite/wait and for invalidating the page cache. Our eventual intent is to then be able to merge the address spaces used for rgrps into a single address space, rather than to have one for each glock, saving memory and reducing complexity. Since during umount, the rgrp structures are disposed of before the glocks, we need to store the extent information in the glock so that is is available for a final invalidation. This patch uses a field which is otherwise unused in rgrp glocks to do that, so that we do not have to expand the size of a glock. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
Since gfs2_inplace_reserve() is always called with a valid alloc parms structure, there is no need to test for this within the function itself - and in any case, after we've all ready dereferenced it anyway. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
There is only one place this is used, when reading in the quota changes at mount time. It is not really required and much simpler to just convert the fields from the on-disk structure as required. There should be no functional change as a result of this patch. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
For historical reasons, we drop and retake the log lock in ->releasepage() however, since there is no reason why we cannot hold the log lock over the whole function, this allows some simplification. In particular, pinning a buffer is only ever done under the log lock, so it is possible here to remove the test for pinned buffers in the second loop, since it is impossible for that to happen (it is also tested in the first loop). As a result, two tests made later in the second loop become constants and can also be reduced to the only possible branch. So the net result is to remove various bits of unreachable code and make this more readable. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
With the preceding patch, we started accepting block reservations smaller than the ideal size, which requires a lot more parsing of the bitmaps. To reduce the amount of bitmap searching, this patch implements a scheme whereby each rgrp keeps track of the point at this multi-block reservations will fail. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
This is just basically a resend of a patch I posted earlier. It didn't change from its original, except in diff offsets, etc: This patch fixes a bug in the GFS2 block allocation code. The problem starts if a process already has a multi-block reservation, but for some reason, another process disqualifies it from further allocations. For example, the other process might set on the GFS2_RDF_ERROR bit. The process holding the reservation jumps to label skip_rgrp, but that label comes after the code that removes the reservation from the tree. Therefore, the no longer usable reservation is not removed from the rgrp's reservations tree; it's lost. Eventually, the lost reservation causes the count of reserved blocks to get off, and eventually that causes a BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free) to trigger. This patch moves the call to after label skip_rgrp so that the disqualified reservation is properly removed from the tree, thus keeping the rgrp rd_reserved count sane. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
Here is a second try at a patch I posted earlier, which also implements suggestions Steve made: Before this patch, GFS2 would keep searching through all the rgrps until it found one that had a chunk of free blocks big enough to satisfy the size hint, which is based on the file write size, regardless of whether the chunk was big enough to perform the write. However, when doing big writes there may not be a large enough chunk of free blocks in any rgrp, due to file system fragmentation. The largest chunk may be big enough to satisfy the write request, but it may not meet the ideal reservation size from the "size hint". The writes would slow to a crawl because every write would search every rgrp, then finally give up and default to a single-block write. In my case, performance would drop from 425MB/s to 18KB/s, or 24000 times slower. This patch basically makes it so that if we can't find a contiguous chunk of blocks big enough to satisfy the sizehint, we'll use the largest chunk of blocks we found that will still contain the write. It does so by keeping track of the largest run of blocks within the rgrp. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 02 1月, 2014 1 次提交
-
-
由 Tetsuo Handa 提交于
GLOCK_BUG_ON() might call this function without RCU read lock. Make sure that RCU read lock is held when using task_struct returned from pid_task(). Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 20 12月, 2013 2 次提交
-
-
由 Steven Whitehouse 提交于
We need to wait for any outstanding DIO to complete in a couple of situations. Firstly, in case we are changing out of deferred mode (in inode_go_sync) where GLF_DIRTY will not be set. That call could be prefixed with a test for gl_state == LM_ST_DEFERRED but it doesn't seem worth it bearing in mind that the test for outstanding DIO is very quick anyway, in the usual case that there is none. The second case is in inode_go_lock which will catch the cases where we have a cached EX lock, but where we grant deferred locks against it so that there is no glock state transistion. We only need to wait if the state is not deferred, since DIO is valid anyway in that state. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
In patch 209806ab we allowed local deferred locks to be granted against a cached exclusive lock. That opened up a corner case which this patch now fixes. The solution to the problem is to check whether we have cached pages each time we do direct I/O and if so to unmap, flush and invalidate those pages. Since the glock state machine normally does that for us, mostly the code will be a no-op. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 14 12月, 2013 3 次提交
-
-
由 Bob Peterson 提交于
This patch fixes a slab memory leak that sometimes can occur for files with a very short lifespan. The problem occurs when a dinode is deleted before it has gotten to the journal properly. In the leak scenario, the bd object is pinned for journal committment (queued to the metadata buffers queue: sd_log_le_buf) but is subsequently unpinned and dequeued before it finds its way to the ail or the revoke queue. In this rare circumstance, the bd object needs to be freed from slab memory, or it is forgotten. We have to be very careful how we do it, though, because multiple processes can call gfs2_remove_from_journal. In order to avoid double-frees, only the process that does the unpinning is allowed to free the bd. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
Function gfs2_remove_from_ail drops the reference on the bh via brelse. This patch fixes a race condition whereby bh is deferenced after the brelse when setting bd->bd_blkno = bh->b_blocknr; Under certain rare circumstances, bh might be gone or reused, and bd->bd_blkno is set to whatever that memory happens to be, which is often 0. Later, in gfs2_trans_add_unrevoke, that bd fails the test "bd->bd_blkno >= blkno" which causes it to never be freed. The end result is that the bd is never freed from the bufdata cache, which results in this error: slab error in kmem_cache_destroy(): cache `gfs2_bufdata': Can't free all objects Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This is a GFS2 version of Tejun's patch: 4f331f01 vfs: don't hold s_umount over close_bdev_exclusive() call In this case its blkdev_put itself that is the issue and this patch uses the same solution of dropping and retaking s_umount. Reported-by: NTejun Heo <tj@kernel.org> Reported-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 22 11月, 2013 1 次提交
-
-
由 Steven Whitehouse 提交于
In the case that atomic_open calls finish_no_open() with the dentry that was supplied to gfs2_atomic_open() an extra reference count is required. This patch fixes that issue preventing a bug trap triggering at umount time. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 21 11月, 2013 1 次提交
-
-
由 Michal Nazarewicz 提交于
Commit [e66cf161: GFS2: Use lockref for glocks] replaced call: atomic_read(&gi->gl->gl_ref) == 0 with: __lockref_is_dead(&gl->gl_lockref) therefore changing how gl is accessed, from gi->gl to plan gl. However, gl can be a NULL pointer, and so gi->gl needs to be used instead (which is guaranteed not to be NULL because fo the while loop checking that condition). Signed-off-by: NMichal Nazarewicz <mina86@mina86.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 16 11月, 2013 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 04 11月, 2013 3 次提交
-
-
由 Steven Whitehouse 提交于
By using the generic list_lru code, we can now separate the per sb quota list locking from the lru locking. The lru lock is made into the inner-most lock. As a result of this new lock order, we may occasionally see items on the per-sb quota list which are "dead" so that the two places where we traverse that list are updated to take account of that. As a result of this patch, the gfs2 quota shrinker is now NUMA zone aware, and we are also laying the foundations for further improvments in due course. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Signed-off-by: NAbhijith Das <adas@redhat.com> Tested-by: NAbhijith Das <adas@redhat.com> Cc: Dave Chinner <dchinner@redhat.com>
-
由 Steven Whitehouse 提交于
This is a straight forward rename which is in preparation for introducing the generic list_lru infrastructure in the following patch. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Signed-off-by: NAbhijith Das <adas@redhat.com> Tested-by: NAbhijith Das <adas@redhat.com>
-
由 Steven Whitehouse 提交于
This patch adds reflink support to the quota data cache. It looks a bit strange because we still don't have a sensible split in the lookup by id and the lru list. That is coming in later patches though. The intent here is just to swap the current ref count for reflinks in all cases with as little as possible other change. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Signed-off-by: NAbhijith Das <adas@redhat.com> Tested-by: NAbhijith Das <adas@redhat.com>
-
- 25 10月, 2013 1 次提交
-
-
由 Al Viro 提交于
duplicated to hell and back... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 10月, 2013 1 次提交
-
-
由 Steven Whitehouse 提交于
Currently glocks have an atomic reference count and also a spinlock which covers various internal fields, such as the state. This intent of this patch is to replace the spinlock and the atomic reference count with a lockref structure. This contains a spinlock which we can continue to use as before, and a reference counter which is used in conjuction with the spinlock to replace the previous atomic counter. As a result of this there are some new rules for reference counting on glocks. We need to distinguish between reference count changes under gl_spin (which are now just increment or decrement of the new counter, provided the count cannot hit zero) and those which are outside of gl_spin, but which now take gl_spin internally. The conversion is relatively straight forward. There is probably some further clean up which can be done, but the priority at this stage is to make the change in as simple a manner as possible. A consequence of this change is that the reference count is being decoupled from the lru list processing. This should allow future adoption of the lru_list code with glocks in due course. The reason for using the "dead" state and not just relying on 0 being the "invalid state" is so that in due course 0 ref counts can be allowable. The intent is to eventually be able to remove the ref count changes which are currently hidden away in state_change(). Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 04 10月, 2013 4 次提交
-
-
由 Steven Whitehouse 提交于
Now that gfs2_quota_sync can be potentially called from multiple threads, we should protect this bit of code, and the sync generation number in particular in order to ensure that there are no races when syncing quotas. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
-
由 Steven Whitehouse 提交于
The function qd_trylock was not a trylock despite its name and can be inlined into gfs2_quota_unlock in order to make the code a bit clearer. There should be no functional change as a result of this patch. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
-
由 Steven Whitehouse 提交于
There should be no functional change bar the removal of a test of the MS_READONLY flag which would never be reachable. This merges the common code from qd_fish and qd_trylock into a single function and calls it from both those places. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
-
由 Steven Whitehouse 提交于
There is no need for a paramater which relates to the internals of quota to be exposed to users. The only possible use would be to turn it up so large that the memory allocation fails. So lets remove it and set it to a sensible value which ensures that we don't ask for multipage allocations. Currently the size of struct gfs2_holder means that the caluclated value is identical to the previous default value, so there should be no functional change. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
-
- 02 10月, 2013 3 次提交
-
-
由 Steven Whitehouse 提交于
This function is only called twice, and both callers are quota related, so lets move this function into quota.c and make it static. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
When setting the starting point for block allocation, there were calls to both gfs2_rbm_to_block() and gfs2_rbm_from_block() in the common case of there being an active reservation. The gfs2_rbm_from_block() function can be quite slow, and since the two conversions were effectively a no-op, it makes sense to avoid them entirely in this case. There is no functional change here, but the code should be a bit more efficient after this patch. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This patch adds a structure to contain allocation parameters with the intention of future expansion of this structure. The idea is that we should be able to add more information about the allocation in the future in order to allow the allocator to make a better job of placing the requests on-disk. There is no functional difference from applying this patch. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 27 9月, 2013 1 次提交
-
-
由 Steven Whitehouse 提交于
The reservation for an inode should be cleared when it is truncated so that we can start again at a different offset for future allocations. We could try and do better than that, by resetting the search based on where the truncation started from, but this is only a first step. In addition, there are three callers of gfs2_rs_delete() but only one of those should really be testing the value of i_writecount. While we get away with that in the other cases currently, I think it would be better if we made that test specific to the one case which requires it. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 23 9月, 2013 1 次提交
-
-
由 Miklos Szeredi 提交于
We need to dput() the result of d_splice_alias(), unless it is passed to finish_no_open(). Edited by Steven Whitehouse in order to make it apply to the current GFS2 git tree, and taking account of a prerequisite patch which hasn't been applied. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: stable@vger.kernel.org
-
- 18 9月, 2013 2 次提交
-
-
由 Bob Peterson 提交于
Since the previous patch eliminated bi in favor of bii, this follow-on patch needed to be adjusted accordingly. Here is the revised version. This patch adds a new function, gfs2_rbm_incr, which increments an rbm structure. This is more efficient than calling gfs2_rbm_to_block, incrementing, then calling gfs2_rbm_from_block. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
This is a respin of the original patch. As Steve pointed out, the introduction of field bii makes it easy to eliminate bi itself. This revised patch does just that, replacing bi with bii. This patch adds a new field to the rbm structure, called bii, which is an index into the array of bitmaps for an rgrp. This replaces *bi which was a pointer to the bitmap. This is being done for further optimizations. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 17 9月, 2013 5 次提交
-
-
由 Bob Peterson 提交于
When we used try locks for rgrps on block allocations, it was important to clear the flags field so that we used a blocking hold on the glock. Now that we're not doing try locks, clearing flags is unnecessary, and a waste of time. In fact, it's probably doing the wrong thing because it clears the GL_SKIP bit that was set for the lvb tracking purposes. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
This patch introduces a new field in the bitmap structure called bi_blocks. Its purpose is to save us from constantly multiplying bi_len by the constant GFS2_NBBY. It also paves the way for more optimization in a future patch. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
In function gfs2_rbm_from_block, it starts by checking if the block falls within the first bitmap. It does so by checking if the rbm's offset is less than (rbm->bi->bi_start + rbm->bi->bi_len) * GFS2_NBBY. However, the first bitmap will always have bi_start==0. Therefore this is an unnecessary calculation in a function that gets called billions of times. This patch removes the reference to bi_start. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Miklos Szeredi 提交于
unless it was given an IS_ERR(inode), which isn't the case here. So clean up the unnecessary error handling in gfs2_create_inode(). This paves the way for real fixes (hence the stable Cc). Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: stable@vger.kernel.org
-
由 Miklos Szeredi 提交于
In gfs2_create_inode() set FILE_CREATED in *opened. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 13 9月, 2013 1 次提交
-
-
由 Kirill A. Shutemov 提交于
truncate_pagecache() doesn't care about old size since commit cedabed4 ("vfs: Fix vmtruncate() regression"). Let's drop it. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 9月, 2013 2 次提交
-
-
由 Dave Chinner 提交于
Convert the filesystem shrinkers to use the new API, and standardise some of the behaviours of the shrinkers at the same time. For example, nr_to_scan means the number of objects to scan, not the number of objects to free. I refactored the CIFS idmap shrinker a little - it really needs to be broken up into a shrinker per tree and keep an item count with the tree root so that we don't need to walk the tree every time the shrinker needs to count the number of objects in the tree (i.e. all the time under memory pressure). [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree] [assorted fixes folded in] Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NGlauber Costa <glommer@openvz.org> Acked-by: NMel Gorman <mgorman@suse.de> Acked-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: NJan Kara <jack@suse.cz> Acked-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Glauber Costa 提交于
The sysctl knob sysctl_vfs_cache_pressure is used to determine which percentage of the shrinkable objects in our cache we should actively try to shrink. It works great in situations in which we have many objects (at least more than 100), because the aproximation errors will be negligible. But if this is not the case, specially when total_objects < 100, we may end up concluding that we have no objects at all (total / 100 = 0, if total < 100). This is certainly not the biggest killer in the world, but may matter in very low kernel memory situations. Signed-off-by: NGlauber Costa <glommer@openvz.org> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-