- 02 2月, 2013 2 次提交
-
-
由 Bob Peterson 提交于
This patch allocates a block reservation structure before growing or shrinking a file. Without this structure, the grow or shink code can reference the bad pointer. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
The intent here is to split the processing of the glock lru list into two parts, so that the selection of glocks and the disposal are separate functions. The plan is then, that further updates can then be made to these functions in the future to improve the selection of glocks and also the efficiency of glock disposal. The new feature which this patch brings is sorting the glocks to be disposed of into glock number (and thus also disk block number) order. Not all glocks will need i/o in order to dispose of them, but some will, and at least we'll generate mostly disk block order i/o now. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 29 1月, 2013 7 次提交
-
-
由 Steven Whitehouse 提交于
Instead of using a list of buffers to write ahead of the journal flush, this now uses a list of inodes and calls ->writepages via filemap_fdatawrite() in order to achieve the same thing. For most use cases this results in a shorter ordered write list, as well as much larger i/os being issued. The ordered write list is sorted by inode number before writing in order to retain the disk block ordering between inodes as per the previous code. The previous ordered write code used to conflict in its assumptions about how to write out the disk blocks with mpage_writepages() so that with this updated version we can also use mpage_writepages() for GFS2's ordered write, writepages implementation. So we will also send larger i/os from writeback too. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
The freeze code has not been looked at a lot recently. Upstream has moved on, and this is an attempt to catch us back up again. There is a vfs level interface for the freeze code which can be called from our (obsolete, but kept for backward compatibility purposes) sysfs freeze interface. This means freezing this way vs. doing it from the ioctl should now work in identical fashion. As a result of this, the freeze function is only called once and we can drop our own special purpose code for counting the number of freezes. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
The locking in gfs2_attach_bufdata() was type specific (data/meta) which made the function rather confusing. This patch moves the core of gfs2_attach_bufdata() into trans.c renaming it gfs2_alloc_bufdata() and moving the locking into gfs2_trans_add_data()/gfs2_trans_add_meta() As a result all of the locking related to adding data and metadata to the journal is now in these two functions. This should help to clarify what is going on, and give us some opportunities to simplify in some cases. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This patch copies the body of gfs2_trans_add_bh into the two newly added gfs2_trans_add_data and gfs2_trans_add_meta functions. We can then move the .lo_add functions from lops.c into trans.c and call them directly. As a result of this, we no longer need to use the .lo_add functions at all, so that is removed from the log operations structure. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
There is little common content in gfs2_trans_add_bh() between the data and meta classes by the time that the functions which it calls are taken into account. The intent here is to split this into two separate functions. Stage one is to introduce gfs2_trans_add_data() and gfs2_trans_add_meta() and update the callers accordingly. Later patches will then pull in the content of gfs2_trans_add_bh() and its dependent functions in order to clean up the code in this area. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This moves the lo_add function for revokes into trans.c, removing a function call and making the code easier to read. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This breaks out the LRU scanning function from the shrinker in preparation for adding other callers to the LRU scanner. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 28 1月, 2013 1 次提交
-
-
由 David Teigland 提交于
The recent commit fb6791d1 included the wrong logic. The lvbptr check was incorrectly added after the patch was tested. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 02 1月, 2013 4 次提交
-
-
由 Bob Peterson 提交于
In function rg_mblk_search, it's searching for multiple blocks in a given state (e.g. "free"). If there's an active block reservation its goal is the next free block of that. If the resource group contains the dinode's goal block, that's used for the search. But if neither is the case, it uses the rgrp's last allocated block. That way, consecutive allocations appear after one another on media. The problem comes in when you hit the end of the rgrp; it would never start over and search from the beginning. This became a problem, since if you deleted all the files and data from the rgrp, it would never start over and find free blocks. So it had to keep searching further out on the media to allocate blocks. This patch resets the rd_last_alloc after it does an unsuccessful search at the end of the rgrp. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
This patch adds a return code check after calling function gfs2_rbm_from_block while determining the free extent size. That way, when the end of an rgrp is reached, it won't try to process unaligned blocks after the end. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Abhijith Das 提交于
QE aio tests uncovered a race condition in gfs2_rs_alloc where it's possible to come out of the function with a valid ip->i_res allocation but it gets freed before use resulting in a NULL ptr dereference. This patch envelopes the initial short-circuit check for non-NULL ip->i_res into the mutex lock. With this patch, I was able to successfully run the reproducer test multiple times. Resolves: rhbz#878476 Signed-off-by: NAbhi Das <adas@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Nathan Straz 提交于
When generating the DLM lock name, a value of 0 would skip the loop and leave the string unchanged. This left locks with a value of 0 unlabeled. Initializing the string to '0' fixes this. Signed-off-by: NNathan Straz <nstraz@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 18 12月, 2012 1 次提交
-
-
由 Andrew Morton 提交于
But the kernel decided to call it "origin" instead. Fix most of the sites. Acked-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 12月, 2012 1 次提交
-
-
由 Rafael Aquini 提交于
Overhaul struct address_space.assoc_mapping renaming it to address_space.private_data and its type is redefined to void*. By this approach we consistently name the .private_* elements from struct address_space as well as allow extended usage for address_space association with other data structures through ->private_data. Also, all users of old ->assoc_mapping element are converted to reflect its new name and type change (->private_data). Signed-off-by: NRafael Aquini <aquini@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <andi@firstfloor.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 11月, 2012 1 次提交
-
-
由 Bob Peterson 提交于
This patch fixes a cluster coherency problem that occurs when one node creates a file, does several writes, then a different node tries to write to the same file. When the inode's glock is demoted, the inode wasn't synced to the media properly because the gl_object wasn't set. Later, the flush daemon noticed the uncommitted data and tried to flush it, only to discover the glock was no longer locked properly in exclusive mode. That caused an assert withdraw. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 16 11月, 2012 2 次提交
-
-
由 Bob Peterson 提交于
This patch adds a return code check after attempting to allocate a new inode during dinode creation. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
This patch changes the block allocation trace so that it references the rgd's glock rather than the inode's glock. Now that the order of inode creation is switched, this prevents a reference to the glock which may not be set yet. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 15 11月, 2012 2 次提交
-
-
由 David Teigland 提交于
The lksb struct already contains a pointer to the lvb, so another directly from the glock struct is not needed. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
Save the effort of allocating, reading and writing the lvb for most glocks that do not use it. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 14 11月, 2012 1 次提交
-
-
由 David Teigland 提交于
When unmounting, gfs2 does a full dlm_unlock operation on every cached lock. This can create a very large amount of work and can take a long time to complete. However, the vast majority of these dlm unlock operations are unnecessary because after all the unlocks are done, gfs2 leaves the dlm lockspace, which automatically clears the locks of the leaving node, without unlocking each one individually. So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to remove the locks implicitly. The one exception is when the lock's lvb is being used. In this case, dlm_unlock is called because it may update the lvb of the resource. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 13 11月, 2012 4 次提交
-
-
由 Steven Whitehouse 提交于
For filesystems with only a single resource group, we need to be careful that the allocation loop will not land up with a NULL resource group. This fixes a bug in a previous patch where the gfs2_rgrpd_get_next() function was being used instead of gfs2_rgrpd_get_first() Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
Since we now have a dirty_inode that takes care of manipulating the inode buffer and writing from the inode to the buffer, we can eliminate some unnecessary buffer manipulations in gfs2_unlink_inode that are now redundant. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
This patch changes the gfs2_dir_add function so that it uses the dirty_inode function (via mark_inode_dirty) rather than manually updating the dinode. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This patch fixes an issue relating to not having enough revokes available when truncating journaled data files. In order to ensure that we do no run out, the truncation is broken into separate pieces if it is large enough. Tested using fsx on a journaled data file. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 07 11月, 2012 13 次提交
-
-
由 Steven Whitehouse 提交于
Just like ext3, this works on the root directory and any directory with the +T flag set. Also, just like ext3, any subdirectory created in one of the just mentioned cases will be allocated to a random resource group (GFS2 equivalent of a block group). If you are creating a set of directories, each of which will contain a job running on a different node, then by setting +T on the parent directory before creating the subdirectories, each will land up in a different resource group, and thus resource group contention between nodes will be kept to a minimum. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
Rather than using the parent directory's allocation context, this patch allocated the new inode earlier in the process and then uses it to contain all the information required. As a result, we can now use the new inode's own allocation context to allocate it rather than having to use the parent directory's context. This give us a lot more flexibility in where the inode is placed on disk. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This patch uses information gathered by the recent glock statistics patch in order to derrive a boolean verdict on the congestion status of a resource group. This is then used when making decisions on which resource group to choose during block allocation. The aim is to avoid resource groups which are heavily contended by other nodes, while still ensuring locality of access wherever possible. Once a reservation has been made in a particular resource group we continue to use that resource group until a new reservation is required. This should help to ensure that we do not change resource groups too often. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
[Editorial: This is a nit, but has been a minor irritation for a long time:] This patch renames glops structure item for go_xmote_th to go_sync. The functionality is unchanged; it's just for readability. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
This patch is a rewrite of function gfs2_rbm_from_block. Rather than looping to find the right bitmap, the code now does a few simple math calculations. I compared the performance of both algorithms side by side and the new algorithm is noticeably faster. Sample instrumentation output from a "fast" machine: 5 million calls: millisec spent: Orig: 166 New: 113 5 million calls: millisec spent: Orig: 189 New: 114 In addition, I ran postmark (on a somewhat slowr CPU) before the after the new algorithm was put in place and postmark showed a decent improvement: Before the new algorithm: ------------------------- Time: 645 seconds total 584 seconds of transactions (171 per second) Files: 150087 created (232 per second) Creation alone: 100000 files (2083 per second) Mixed with transactions: 50087 files (85 per second) 49995 read (85 per second) 49991 appended (85 per second) 150087 deleted (232 per second) Deletion alone: 100174 files (7705 per second) Mixed with transactions: 49913 files (85 per second) Data: 273.42 megabytes read (434.08 kilobytes per second) 852.13 megabytes written (1.32 megabytes per second) With the new algorithm: ----------------------- Time: 599 seconds total 530 seconds of transactions (188 per second) Files: 150087 created (250 per second) Creation alone: 100000 files (1886 per second) Mixed with transactions: 50087 files (94 per second) 49995 read (94 per second) 49991 appended (94 per second) 150087 deleted (250 per second) Deletion alone: 100174 files (6260 per second) Mixed with transactions: 49913 files (94 per second) Data: 273.42 megabytes read (467.42 kilobytes per second) 852.13 megabytes written (1.42 megabytes per second) Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
Two of the bug traps here could really be warnings. The others are converted from BUG() to GLOCK_BUG_ON() since we'll most likely need to know the glock state in order to debug any issues which arise. As a result of this, __dump_glock has to be renamed and is no longer static. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Benjamin Marzinski 提交于
In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the buffer without having the gfs2_log_lock held. It was then assuming it would stay attached for the rest of the function. However, without either the log lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any time. This patch moves the locking before the test. If there isn't a bd already attached, gfs2 can safely allocate one and attach it before locking. There is no way that the newly allocated bd could be on the ail list, and thus no way for __gfs2_ail_flush() to detach it. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Benjamin Marzinski 提交于
file_accessed() was being called by gfs2_mmap() with a shared glock. If it needed to update the atime, it was crashing because it dirtied the inode in gfs2_dirty_inode() without holding an exclusive lock. gfs2_dirty_inode() checked if the caller was already holding a glock, but it didn't make sure that the glock was in the exclusive state. Now, instead of calling file_accessed() while holding the shared lock in gfs2_mmap(), file_accessed() is called after grabbing and releasing the glock to update the inode. If file_accessed() needs to update the atime, it will grab an exclusive lock in gfs2_dirty_inode(). gfs2_dirty_inode() now also checks to make sure that if the calling process has already locked the glock, it has an exclusive lock. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Lukas Czerner 提交于
Currently implementation in gfs2 uses FITRIM arguments as it were in file system blocks units which is wrong. The FITRIM arguments (fstrim_range.start, fstrim_range.len and fstrim_range.minlen) are actually in bytes. Moreover, check for start argument beyond the end of file system, len argument being smaller than file system block and minlen argument being bigger than biggest resource group were missing. This commit converts the code to convert FITRIM argument to file system blocks and also adds appropriate checks mentioned above. All the problems were recognised by xfstests 251 and 260. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Lukas Czerner 提交于
When the fstrim_range argument is not provided by user in FITRIM ioctl we should just return EFAULT and not promoting bad behaviour by filling the structure in kernel. Let the user deal with it. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Andrew Price 提交于
Cleans up two cases where variables were assigned values but then never used again. Signed-off-by: NAndrew Price <anprice@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Andrew Price 提交于
Despite the return value from kmem_cache_zalloc() being checked, the error wasn't being returned until after a possible null pointer dereference. This patch returns the error immediately, allowing the removal of the error variable. Signed-off-by: NAndrew Price <anprice@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Andrew Price 提交于
Check the return value of gfs2_rs_alloc(ip) and avoid a possible null pointer dereference. Signed-off-by: NAndrew Price <anprice@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 10 10月, 2012 1 次提交
-
-
由 Hugh Dickins 提交于
Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(), u64 inum = fid->raw[2]; which is unhelpfully reported as at the end of shmem_alloc_inode(): BUG: unable to handle kernel paging request at ffff880061cd3000 IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Call Trace: [<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0 [<ffffffff812d77c3>] do_handle_open+0x163/0x2c0 [<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10 [<ffffffff83a5f3f8>] tracesys+0xe1/0xe6 Right, tmpfs is being stupid to access fid->raw[2] before validating that fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may fall at the end of a page, and the next page not be present. But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and could oops in the same way: add the missing fh_len checks to those. Reported-by: NSasha Levin <levinsasha928@gmail.com> Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Sage Weil <sage@inktank.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-