- 07 2月, 2010 2 次提交
-
-
由 Jun'ichi Nomura 提交于
Thanks Thomas and Christoph for testing and review. I removed 'smp_wmb()' before up_write from the previous patch, since up_write() should have necessary ordering constraints. (I.e. the change of s_frozen is visible to others after up_write) I'm quite sure the change is harmless but if you are uncomfortable with Tested-by/Reviewed-by on the modified patch, please remove them. If MS_RDONLY, freeze_bdev should just up_write(s_umount) instead of deactivate_locked_super(). Also, keep sb->s_frozen consistent so that remount can check the frozen state. Otherwise a crash reported here can happen: http://lkml.org/lkml/2010/1/16/37 http://lkml.org/lkml/2010/1/28/53 This patch should be applied for 2.6.32 stable series, too. Reviewed-by: NChristoph Hellwig <hch@lst.de> Tested-by: NThomas Backlund <tmb@mandriva.org> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: stable@kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 2月, 2010 6 次提交
-
-
由 Aneesh Kumar K.V 提交于
This version of the i_size fix for fallocate makes sure we only update the i_size when the current fallocate is really operating outside of i_size. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
When running the following fio job [torrent] filename=torrent-test rw=randwrite size=4g filesize=4g bs=4k ioengine=sync you would see long stalls where no work was being done. That is because we were doing all this extra work to read in the file extent outside of the transaction, however in the random io case this ends up hurting us because the file extents are not there to begin with. So axe this logic, since we end up reading in the file extent when we go to update it anyway. This took the fio job from 11 mb/s with several ~10 second stalls to 24 mb/s to a couple of 1-2 second stalls. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
When dropping a empty tree, walk_down_tree() skips checking extent information for the tree root. This will triggers a BUG_ON in walk_up_proc(). Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Miao Xie 提交于
Mounting a bad filesystem caused a BUG_ON(). The following is steps to reproduce it. # mkfs.btrfs /dev/sda2 # mount /dev/sda2 /mnt # mkfs.btrfs /dev/sda1 /dev/sda2 (the program says that /dev/sda2 was mounted, and then exits. ) # umount /mnt # mount /dev/sda1 /mnt At the third step, mkfs.btrfs exited in the way of make filesystem. So the initialization of the filesystem didn't finish. So the filesystem was bad, and it caused BUG_ON() when mounting it. But BUG_ON() should be called by the wrong code, not user's operation, so I think it is a bug of btrfs. This patch fixes it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Roel Kluin 提交于
It appears the error return should be negative Signed-off-by: NRoel Kluin <roel.kluin@gmail.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
Increase extent buffer's reference count while holding the lock. Otherwise it can race with try_release_extent_buffer. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 03 2月, 2010 7 次提交
-
-
由 Trond Myklebust 提交于
If the NFS_ATTR_FATTR_TYPE field isn't set in fattr->valid, then we should not set the S_IFMT part of inode->i_mode. Reported-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
Ensure that we unregister the bdi before kill_anon_super() calls ida_remove() on our device name. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
-
由 Trond Myklebust 提交于
The VM/VFS does not allow mapping->a_ops->invalidatepage() to fail. Unfortunately, nfs_wb_page_cancel() may fail if a fatal signal occurs. Since the NFS code assumes that the page stays mapped for as long as the writeback is active, we can end up Oopsing (among other things). The only safe fix here is to convert nfs_wait_on_request(), so as to make it uninterruptible (as is already the case with wait_on_page_writeback()). Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
-
由 Steven Whitehouse 提交于
Although all glocks are, by the time of the umount glock wait, scheduled for demotion, some of them haven't made it far enough through the process for the original set of waiting code to wait for them. This extends the ref count to the whole glock lifetime in order to ensure that the waiting does catch all glocks. It does make it a bit more invasive, but it seems the only sensible solution at the moment. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This patch adds a wait on umount between the point at which we dispose of all glocks and the point at which we unmount the lock protocol. This ensures that we've received all the replies to our unlock requests before we stop the locking. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Reported-by: NFabio M. Di Nitto <fdinitto@redhat.com>
-
由 anfei zhou 提交于
The cache alias problem will happen if the changes of user shared mapping is not flushed before copying, then user and kernel mapping may be mapped into two different cache line, it is impossible to guarantee the coherence after iov_iter_copy_from_user_atomic. So the right steps should be: flush_dcache_page(page); kmap_atomic(page); write to page; kunmap_atomic(page); flush_dcache_page(page); More precisely, we might create two new APIs flush_dcache_user_page and flush_dcache_kern_page to replace the two flush_dcache_page accordingly. Here is a snippet tested on omap2430 with VIPT cache, and I think it is not ARM-specific: int val = 0x11111111; fd = open("abc", O_RDWR); addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); *(addr+0) = 0x44444444; tmp = *(addr+0); *(addr+1) = 0x77777777; write(fd, &val, sizeof(int)); close(fd); The results are not always 0x11111111 0x77777777 at the beginning as expected. Sometimes we see 0x44444444 0x77777777. Signed-off-by: NAnfei <anfei.zhou@gmail.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: <linux-arch@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Commit 221af7f8 ("Split 'flush_old_exec' into two functions") split the function at the point of no return - ie right where there were no more error cases to check. That made sense from a technical standpoint, but when we then also combined it with the actual personality setting going in between flush_old_exec() and setup_new_exec(), it needs to be a bit more careful. In particular, we need to make sure that we really flush the old personality bits in the 'flush' stage, rather than later in the 'setup' stage, since otherwise we might be flushing the _new_ personality state that we're just setting up. So this moves the flags and personality flushing (and 'flush_thread()', which is the arch-specific function that generally resets lazy FP state etc) of the old process into flush_old_exec(), so that it doesn't affect any state that execve() is setting up for the new process environment. This was reported by Michal Simek as breaking his Microblaze qemu environment. Reported-and-tested-by: NMichal Simek <michal.simek@petalogix.com> Cc: Peter Anvin <hpa@zytor.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 2月, 2010 3 次提交
-
-
由 Steven Whitehouse 提交于
This is called under a glock, so its a good plan to use GFP_NOFS Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
The do_div() call needs to remain. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Benjamin Marzinski 提交于
ince gfs2 writes the rindex file a block at a time, and releases the exclusive lock after each block, it is possible that another process will grab the lock in the middle of the write. Since rindex entries are not an even divisor of blocks, that other process may see partial entries. On grows, this is fine. The process can simply ignore the the partial entires. Previously, the code withdrew when it saw partial entries. Now it simply ignores them. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 31 1月, 2010 2 次提交
-
-
由 Ryusuke Konishi 提交于
This fixes incorrect usage of nilfs_segctor_confirm() test function in nilfs_segctor_destroy(); nilfs_segctor_confirm() returns zero if the filesystem is not clean, so its use in nilfs_segctor_destroy() needs inversion. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Chuck Ebbert 提交于
Fix two bugs in the bio integrity code: use_bip_pool() always returns 0 because it checks against the wrong limit, causing the mempool to be used only when regular allocation fails. When the mempool is used as a fallback we don't free the data properly. Signed-Off-By: NChuck Ebbert <cebbert@redhat.com> Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 30 1月, 2010 1 次提交
-
-
由 Linus Torvalds 提交于
'flush_old_exec()' is the point of no return when doing an execve(), and it is pretty badly misnamed. It doesn't just flush the old executable environment, it also starts up the new one. Which is very inconvenient for things like setting up the new personality, because we want the new personality to affect the starting of the new environment, but at the same time we do _not_ want the new personality to take effect if flushing the old one fails. As a result, the x86-64 '32-bit' personality is actually done using this insane "I'm going to change the ABI, but I haven't done it yet" bit (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the personality, but just the "pending" bit, so that "flush_thread()" can do the actual personality magic. This patch in no way changes any of that insanity, but it does split the 'flush_old_exec()' function up into a preparatory part that can fail (still called flush_old_exec()), and a new part that will actually set up the new exec environment (setup_new_exec()). All callers are changed to trivially comply with the new world order. Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 1月, 2010 8 次提交
-
-
由 Josef Bacik 提交于
If you have a disk failure in RAID1 and then add a new disk to the array, and then try to remove the missing volume, it will fail. The reason is the sanity check only looks at the total number of rw devices, which is just 2 because we have 2 good disks and 1 bad one. Instead check the total number of devices in the array to make sure we can actually remove the device. Tested this with a failed disk setup and with this test we can now run btrfs-vol -r missing /mount/point and it works fine. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
Hit this problem while testing RAID1 failure stuff. open_bdev_exclusive returns ERR_PTR(), not NULL. So change the return value properly. This is important if you accidently specify a device that doesn't exist when trying to add a new device to an array, you will panic the box dereferencing bdev. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
If a RAID setup has chunks that span multiple disks, and one of those disks has failed, btrfs_chunk_readonly will return 1 since one of the disks in that chunk's stripes is dead and therefore not writeable. So instead if we are in degraded mode, return 0 so we can go ahead and allocate stuff. Without this patch all of the block groups in a RAID1 setup will end up read-only, which will mean we can't add new disks to the array since we won't be able to make allocations. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
This patch revert's commit 6c090a11 Since it introduces this problem where we can run orphan cleanup on a volume that can have orphan entries re-added. Instead of my original fix, Yan Zheng pointed out that we can just revert my original fix and then run the orphan cleanup in open_ctree after we look up the fs_root. I have tested this with all the tests that gave me problems and this patch fixes both problems. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yang Hongyang 提交于
In btrfs_init_acl() cloned acl is not released Signed-off-by: NYang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Aneesh Kumar K.V 提交于
commit f2bc9dd07e3424c4ec5f3949961fe053d47bc825 Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Date: Wed Jan 20 12:57:53 2010 +0530 Btrfs: Use correct values when updating inode i_size on fallocate Even though we allocate more, we should be updating inode i_size as per the arguments passed Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Miao Xie 提交于
This patch removes tree_search() in extent_map.c because it is not called by anything. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
The default btrfs mount -o compress mode will quickly back off compressing a file if it notices that compression does not reduce the size of the data being written. This can save considerable CPU because all future writes to the file go through uncompressed. But some files are both very large and have mixed data stored in them. In that case, we want to add the ability to always try compressing data before writing it. This commit adds mount -o compress-force. A later commit will add a new inode flag that does the same thing. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 28 1月, 2010 2 次提交
-
-
由 Dmitry Monakhov 提交于
We have to properly decrease bi_size in order to merge_bvec_fn return right result. Otherwise this result in false merge rejects for two absolutely valid bio_vecs. This may cause significant performance penalty for example fs_block_size == 1k and block device is raid0 with small chunk_size = 8k. Then it is impossible to merge 7-th fs-block in to bio which already has 6 fs-blocks. Cc: <stable@kernel.org> Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Frederic Weisbecker 提交于
Vmalloc is called to allocate journal->j_cnode_free_list but we hold the reiserfs lock at this time, which raises a {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} lock inversion. Just drop the reiserfs lock at this time, as it's not even needed but kept for paranoid reasons. This fixes: [ INFO: inconsistent lock state ] 2.6.33-rc5 #1 --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/313 [HC0[0]:SC0[0]:HE1:SE1] takes: (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c11118c8>] reiserfs_write_lock_once+0x28/0x50 {RECLAIM_FS-ON-W} state was registered at: [<c104ee32>] mark_held_locks+0x62/0x90 [<c104eefa>] lockdep_trace_alloc+0x9a/0xc0 [<c108f7b6>] kmem_cache_alloc+0x26/0xf0 [<c108621c>] __get_vm_area_node+0x6c/0xf0 [<c108690e>] __vmalloc_node+0x7e/0xa0 [<c1086aab>] vmalloc+0x2b/0x30 [<c110e1fb>] journal_init+0x6cb/0xa10 [<c10f90a2>] reiserfs_fill_super+0x342/0xb80 [<c1095665>] get_sb_bdev+0x145/0x180 [<c10f68e1>] get_super_block+0x21/0x30 [<c1094520>] vfs_kern_mount+0x40/0xd0 [<c1094609>] do_kern_mount+0x39/0xd0 [<c10aaa97>] do_mount+0x2c7/0x6d0 [<c10aaf06>] sys_mount+0x66/0xa0 [<c16198a7>] mount_block_root+0xc4/0x245 [<c1619a81>] mount_root+0x59/0x5f [<c1619b98>] prepare_namespace+0x111/0x14b [<c1619269>] kernel_init+0xcf/0xdb [<c100303a>] kernel_thread_helper+0x6/0x1c irq event stamp: 63236801 hardirqs last enabled at (63236801): [<c134e7fa>] __mutex_unlock_slowpath+0x9a/0x120 hardirqs last disabled at (63236800): [<c134e799>] __mutex_unlock_slowpath+0x39/0x120 softirqs last enabled at (63218800): [<c102f451>] __do_softirq+0xc1/0x110 softirqs last disabled at (63218789): [<c102f4ed>] do_softirq+0x4d/0x60 other info that might help us debug this: 2 locks held by kswapd0/313: #0: (shrinker_rwsem){++++..}, at: [<c1074bb4>] shrink_slab+0x24/0x170 #1: (&type->s_umount_key#19){++++..}, at: [<c10a2edd>] shrink_dcache_memory+0xfd/0x1a0 stack backtrace: Pid: 313, comm: kswapd0 Not tainted 2.6.33-rc5 #1 Call Trace: [<c134db2c>] ? printk+0x18/0x1c [<c104e7ef>] print_usage_bug+0x15f/0x1a0 [<c104ebcf>] mark_lock+0x39f/0x5a0 [<c104d66b>] ? trace_hardirqs_off+0xb/0x10 [<c1052c50>] ? check_usage_forwards+0x0/0xf0 [<c1050c24>] __lock_acquire+0x214/0xa70 [<c10438c5>] ? sched_clock_cpu+0x95/0x110 [<c10514fa>] lock_acquire+0x7a/0xa0 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50 [<c134f03f>] mutex_lock_nested+0x5f/0x2b0 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50 [<c11118c8>] reiserfs_write_lock_once+0x28/0x50 [<c10f05b0>] reiserfs_delete_inode+0x50/0x140 [<c10a653f>] ? generic_delete_inode+0x5f/0x150 [<c10f0560>] ? reiserfs_delete_inode+0x0/0x140 [<c10a657c>] generic_delete_inode+0x9c/0x150 [<c10a666d>] generic_drop_inode+0x3d/0x60 [<c10a5597>] iput+0x47/0x50 [<c10a2a4f>] dentry_iput+0x6f/0xf0 [<c10a2af4>] d_kill+0x24/0x50 [<c10a2d3d>] __shrink_dcache_sb+0x21d/0x2b0 [<c10a2f0f>] shrink_dcache_memory+0x12f/0x1a0 [<c1074c9e>] shrink_slab+0x10e/0x170 [<c1075177>] kswapd+0x477/0x6a0 [<c1072d10>] ? isolate_pages_global+0x0/0x1b0 [<c103e160>] ? autoremove_wake_function+0x0/0x40 [<c1074d00>] ? kswapd+0x0/0x6a0 [<c103de6c>] kthread+0x6c/0x80 [<c103de00>] ? kthread+0x0/0x80 [<c100303a>] kernel_thread_helper+0x6/0x1c Reported-by: NAlexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Chris Mason <chris.mason@oracle.com>
-
- 27 1月, 2010 9 次提交
-
-
由 Al Viro 提交于
if 9P ->get_sb() fails late (at root inode or root dentry allocation), we'll hit its ->kill_sb() with NULL ->s_root Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
if we'd just got success from it, vfsmount won't be NULL Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
double iput(), leaks... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Error handling in that sucker got broken back in 2003. If function returns 0 on failure, it's not nice to add return -EINVAL into it. Adding return 1 on other failure exits is also not a good thing (and yes, original success exits with 1 and some of failure exits with 0 are still there; so's the original logics in callers). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
A couple of fields in affs_sb_info is used in follow_link() and symlink() for handling AFFS "absolute" symlinks. Need locking against affs_remount() updates. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Greg Kroah-Hartman 提交于
Commit 70362511 exposed that f_modown() should call write_lock_irqsave instead of just write_lock_irq so that because a caller could have a spinlock held and it would not be good to renable interrupts. Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Tavis Ormandy <taviso@google.com> Cc: stable <stable@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Trond Myklebust 提交于
Even if the server is crazy, we should be able to mark the stateid as being bad, to ensure it gets recovered. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
-